Loading lesson...
When yesterday's truth still matters
When yesterday's truth still matters
Topics covered: The History Problem, Type 1: Overwrite Old Value, Type 2: Preserve Full History, Type 3: Keep Previous Value, Choosing Your Strategy
Why Overwriting Destroys Information When a dimension attribute changes, the simplest approach is to overwrite the old value with the new one. Customer moved to Seattle? UPDATE customers SET city = 'Seattle'. Done. Simple. But now every fact row that referenced this customer while they lived in Portland shows 'Seattle.' Last quarter's revenue by region report just retroactively changed. Portland lost revenue it actually had. Seattle gained revenue it did not earn. This is the slowly changing dim
When History Does Not Matter Type 1 simply overwrites the old value with the new one. The customer moved to Seattle? UPDATE the city column. The old value (Portland) is gone forever. This is the right choice for attributes where nobody ever needs the historical value: typo corrections, phone number updates, name changes due to marriage. Type 1 is not lazy modeling. It is the correct choice when historical values have no analytical meaning. Applying Type 2 to a phone number creates dimension rows
Keeping Every Version of a Dimension Row Type 2 creates a new row for every change. The old row is marked as expired. The new row is marked as current. Both rows have different surrogate keys. Fact rows from before the change point to the old SK. Fact rows after the change point to the new SK. Revenue is attributed to the correct city at the time of each transaction. Two rows, same customer_id, different surrogate keys. Fact rows from January 2024 join to SK 1001 (Portland). Fact rows from July
Limited History: Just the Last Two States Type 3 adds a 'previous' column alongside the current column. customer has city and previous_city. When the customer moves, the current city becomes the previous city, and the new city becomes current. Only one prior value is preserved. History beyond the immediately-previous state is lost. Type 3 is rarely used in practice. It is useful for exactly one scenario: before-and-after comparisons. 'Show me customers who changed region this quarter.' Beyond th
Per-Attribute Strategy The most important concept: SCD type is chosen per attribute, not per table. A single dim_customer can have Type 1 on name (typo corrections do not need history), Type 2 on region (revenue attribution needs point-in-time accuracy), and Type 3 on segment (before/after comparison for a migration). The decision for each attribute: does anyone need the historical value? If the answer is 'yes, for reporting or audit,' use Type 2. If the answer is 'no, and keeping it creates use