DataDriven
LearnPracticeInterviewDiscussDaily
HelpContactPrivacyTermsSecurityiOS App

© 2026 DataDriven

Loading lesson...

  1. Home
  2. Learn
  3. Normalization

Normalization

Why copying data breaks everything

Why copying data breaks everything

Category
Data Modeling
Duration
15 minutes
Challenges
12 hands-on challenges

Topics covered: Data Gets Out of Sync, First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Identifying Normal Form

Lesson Sections

  1. Data Gets Out of Sync

    Why Normalization Exists Normalization exists to prevent a specific class of production failure: data that contradicts itself. When the same fact is stored in multiple rows or multiple tables, any update must touch every copy. Miss one, and your database has two different answers to the same question. This is not a theoretical concern. It is the number one source of 'the numbers do not match' bugs in real systems. Here is the canonical example: an employee table that looks reasonable until a dep

  2. First Normal Form (1NF) (concepts: dmFirstNormalForm)

    1NF: No Lists in Cells A table is in first normal form when every cell contains a single atomic value. No comma-separated lists. No arrays stored as strings. No multiple values crammed into one column. This sounds obvious, but 1NF violations are everywhere in real data. A tags column with 'python, sql, spark'. A phone_numbers column with '555-1234, 555-5678'. These break filtering, indexing, and joining. Here is a 1NF violation: You cannot filter for 'students taking Physics' without string pars

  3. Second Normal Form (2NF) (concepts: dmSecondNormalForm)

    2NF: No Partial Dependencies on Composite Keys 2NF only matters when your primary key spans multiple columns. If your PK is a single column, your table automatically satisfies 2NF. The violation: a non-key column depends on only PART of the composite key, not the whole thing. Consider a store inventory table with a composite PK of (store_id, product_id). The table includes store_name and product_name alongside the quantity. store_name depends only on store_id, not on the full key (store_id, prod

  4. Third Normal Form (3NF) (concepts: dmThirdNormalForm)

    3NF: No Transitive Dependencies 3NF is the most practically important normal form because it catches the most common data duplication pattern: a column that depends on another non-key column instead of the primary key. This applies to every table, not just those with composite keys. Consider an orders table where each order has a customer_id, customer_name, and customer_city. customer_name and customer_city depend on customer_id, not on order_id. They are transitively dependent: order_id determi

  5. Identifying Normal Form

    The Three-Step Check When you look at any table and need to determine its normal form, apply these three checks in order. Stop at the first failure. In practice, 90% of real-world normalization problems are 3NF violations: a customer name stored on an orders table, a department location stored on an employee table, a product category stored on a sales table. These all have the same shape: an attribute that belongs to a related entity is denormalized onto the current table. When Not to Normalize:

Related

  • All Lessons
  • Practice Problems
  • Mock Interview Practice
  • Daily Challenges