Loading lesson...
Beyond third normal form
Beyond third normal form
Topics covered: Boyce-Codd Normal Form, Fourth Normal Form (4NF), Fifth Normal Form (5NF), Strategic Denormalization, Denormalization Patterns
BCNF: When 3NF Is Not Enough BCNF is a stricter version of 3NF. A table is in BCNF when every determinant (any column that other columns depend on) is a candidate key. The difference from 3NF: 3NF allows a non-key column to determine another non-key column if the determining column is part of a candidate key. BCNF does not allow this. In practice, BCNF violations are rare. They appear in tables with multiple overlapping candidate keys. The canonical example: a course scheduling table where each
4NF: Separate Independent Multi-Valued Facts 4NF handles multi-valued dependencies: when an entity has two or more independent multi-valued attributes. A person can speak multiple languages AND play multiple instruments. These two facts are independent of each other. Storing them in the same table creates a cartesian product. If Alice speaks English and Spanish, and plays piano and guitar, a single table creates four rows: (Alice, English, piano), (Alice, English, guitar), (Alice, Spanish, piano
5NF: Decompose Complex Join Dependencies 5NF is the most theoretical normal form you are likely to encounter. A table is in 5NF when it cannot be decomposed into smaller tables without losing information when you join them back together. In practice, 5NF violations are extremely rare. The classic example: suppliers, parts, and projects. A supplier can supply a part. A supplier can work on a project. A part can be used in a project. But the three-way relationship (this supplier provides this part
When to Break the Rules Normalization optimizes for write safety. Denormalization optimizes for read performance. In data engineering, the majority of workloads are analytical (read-heavy), so strategic denormalization is not a shortcut. It is a deliberate architectural choice. The key word is 'strategic.' You must know what you are trading away. Every denormalized column is a consistency liability: if the source value changes, every copy must be updated. The question is whether the read perform
There are several standard denormalization patterns, each with specific use cases and tradeoffs. Knowing which pattern fits your situation saves you from reinventing solutions that the industry has already standardized. Pattern 1: Pre-Joined Dimension Columns Copy frequently-queried dimension attributes directly into the fact table. Instead of joining fact_orders to dim_customer to get customer_region, store customer_region on the fact row. This eliminates the JOIN for every query that filters o