Change Tracking

A customer moves from Portland to Seattle. Do you overwrite Portland with Seattle? Or do you keep both, so that last quarter's revenue report still shows Portland? This is the slowly changing dimension (SCD) problem, and the answer depends on whether anyone needs the historical value. This lesson teaches you the three main SCD strategies, when each one is the right choice, and how to implement Type 2 (the most common and most complex) correctly.

The History Problem

Daily Life

Interviews

Understand why overwriting loses context

Why Overwriting Destroys Information

When a dimension attribute changes, the simplest approach is to overwrite the old value with the new one. Customer moved to Seattle? UPDATE customers SET city = 'Seattle'. Done. Simple. But now every fact row that referenced this customer while they lived in Portland shows 'Seattle.' Last quarter's revenue by region report just retroactively changed. Portland lost revenue it actually had. Seattle gained revenue it did not earn.

This is the slowly changing dimension problem. Dimension data changes over time (customer addresses, product prices, employee titles), and how you handle those changes determines whether your historical reports stay accurate.

Quarter	City (Overwritten)	Revenue	Accurate?
Q1 2024	Seattle (was Portland)	$50,000	No: customer was in Portland during Q1
Q2 2024	Seattle	$60,000	Yes: customer is in Seattle now
Q3 2024	Seattle	$45,000	Yes

The Q1 revenue was earned while the customer lived in Portland. After the overwrite, it appears under Seattle. If your company reports revenue by geography to investors, this is a material misstatement.

SCD Type 1: Overwrite. History is lost. Simple but destroys point-in-time accuracy.

SCD Type 2: Version. Keep all historical values. Complex but preserves full history.

SCD Type 3: Previous column. Keep only the immediately prior value. Limited history, simple implementation.

TIP

Not every attribute needs history. A customer's phone number correction does not affect revenue reports. The question is always: does anyone need the historical value? If yes, Type 2. If no, Type 1. Choose per attribute, not per table.

Type 1: Overwrite Old Value

Daily Life

Interviews

Update dimensions when history is irrelevant

When History Does Not Matter

Type 1 simply overwrites the old value with the new one. The customer moved to Seattle? UPDATE the city column. The old value (Portland) is gone forever. This is the right choice for attributes where nobody ever needs the historical value: typo corrections, phone number updates, name changes due to marriage.

Type 1 is not lazy modeling. It is the correct choice when historical values have no analytical meaning. Applying Type 2 to a phone number creates dimension rows that nobody will ever query, bloating the table for no benefit.

dim_customer (Type 1)

PKcustomer_id

name

city

phone

Operation	customer_id	city
Before	C001	Portland
After UPDATE	C001	Seattle (Portland gone)

Type 1 is a destructive operation. There is no undo. If you later discover you needed the historical value, it is too late. When in doubt, use Type 2.

Type 2: Preserve Full History

Daily Life

Interviews

Keep every version of a changing record

Keeping Every Version of a Dimension Row

Type 2 creates a new row for every change. The old row is marked as expired. The new row is marked as current. Both rows have different surrogate keys. Fact rows from before the change point to the old SK. Fact rows after the change point to the new SK. Revenue is attributed to the correct city at the time of each transaction.

dim_customer (Type 2)

PKcustomer_sk

customer_id

name

city

effective_from

effective_to

is_current

customer_sk	customer_id	city	effective_from	effective_to	is_current
1001	C001	Portland	2023-01-01	2024-06-14	false
1002	C001	Seattle	2024-06-15	9999-12-31	true

Two rows, same customer_id, different surrogate keys. Fact rows from January 2024 join to SK 1001 (Portland). Fact rows from July 2024 join to SK 1002 (Seattle). Revenue attribution is correct at every point in time.

Key Components

Unique per version. The fact table joins on this. Each version of the customer gets its own SK.
The business identifier. Multiple rows share the same customer_id. GROUP BY customer_id to see all versions.
The validity window. effective_to = '9999-12-31' means 'still active.'
Convenience flag. Redundant with effective_to = '9999-12-31' but makes WHERE is_current = TRUE queries simpler.

Point-in-Time Joins

The power of Type 2: any fact row can be joined to the correct version of the dimension. If the fact stores the surrogate key (assigned at load time), the join is a simple equality. If the fact stores the natural key, the join uses a date range: WHERE fact.order_date BETWEEN dim.effective_from AND dim.effective_to.

✓Join by Surrogate Key

Simple equality join: fact.customer_sk = dim.customer_sk
Fast: standard hash join
Requires SK to be assigned at fact load time
Most common approach

✓Join by Natural Key + Date Range

Range join: fact.order_date BETWEEN effective_from AND effective_to
Slower: range predicate, no hash join
Works when fact table stores natural key
Useful for late-arriving facts

The Storage Tradeoff

Type 2 grows the dimension table. A customer with 10 changes has 10 rows. If 1 million customers average 5 changes each, the dimension has 5 million rows. For most dimensions, this is fine. For dimensions with high-frequency changes (daily behavioral scores, real-time status updates), the table can explode. Consider mini-dimensions for high-churn attributes.

TIP

Type 2 is the standard for any attribute that affects reporting. Customer region, product category, employee department, subscription tier. When in doubt, use Type 2. The storage cost of extra rows is negligible compared to the cost of incorrect historical reports.

Type 3: Keep Previous Value

Daily Life

Interviews

Store just the last two states of a record

Limited History: Just the Last Two States

Type 3 adds a 'previous' column alongside the current column. customer has city and previous_city. When the customer moves, the current city becomes the previous city, and the new city becomes current. Only one prior value is preserved. History beyond the immediately-previous state is lost.

customer_id	city	previous_city
C001	Seattle	Portland

Type 3 is rarely used in practice. It is useful for exactly one scenario: before-and-after comparisons. 'Show me customers who changed region this quarter.' Beyond that, it is too limited. If the customer moves a third time, Portland is lost. For full history, use Type 2.

⚠Type 3 (Limited)

Only one prior value preserved
No effective dates
No row growth
Cannot answer 'city in Q1 2023?'

✓Type 2 (Full History)

All historical values preserved
Effective date range per version
One row per change
Can answer point-in-time queries for any date

Choosing Your Strategy

Daily Life

Interviews

Match SCD type to your reporting needs

Per-Attribute Strategy

The most important concept: SCD type is chosen per attribute, not per table. A single dim_customer can have Type 1 on name (typo corrections do not need history), Type 2 on region (revenue attribution needs point-in-time accuracy), and Type 3 on segment (before/after comparison for a migration).

The decision for each attribute: does anyone need the historical value? If the answer is 'yes, for reporting or audit,' use Type 2. If the answer is 'no, and keeping it creates useless rows,' use Type 1. If the answer is 'only the immediately previous value, for a specific comparison,' Type 3.

Type 1Type 2Type 3

Type 1

Corrections and Contact Info

Fix a typo in a name. Update a phone number. Change an email. Nobody reports by historical phone numbers.

Type 2

Anything That Affects Reporting

Customer region, product category, subscription tier, employee department. Revenue and metrics must be attributed to the value at the time of the event.

Type 3

Before/After Comparisons

Customer migrated from segment A to segment B. Need to compare behavior in both segments. Rare use case.

Attribute	SCD Type	Why
Customer name	Type 1	Name corrections have no analytical impact
Customer region	Type 2	Revenue must be attributed to region at time of purchase
Customer email	Type 1	Nobody reports by historical email
Product category	Type 2	Category-level metrics need point-in-time accuracy
Employee title	Type 2	Headcount reports need to reflect title at each point in time
Customer segment (migration)	Type 3	Need before/after comparison for a one-time migration analysis

✗Common Mistake

Apply Type 2 to every attribute
Table grows 40x because phone numbers change
Queries slow because dimension is bloated

✓Best Practice

Choose type per attribute
Type 1 for corrections and contact info
Type 2 only for attributes that affect reporting

TIP

Start every dimension design by listing the attributes and classifying each one: Type 1 or Type 2? This takes 5 minutes and prevents months of either lost history (everything Type 1) or table bloat (everything Type 2).

❯❯❯PUTTING IT ALL TOGETHER

> You are designing dim_employee for an HR analytics warehouse. Employees change departments, titles, and phone numbers over time.

Phone number: Type 1. Nobody reports by historical phone number. Overwrite is correct.

Department and title: Type 2. Headcount reports need to show the employee's department at each point in time. A surrogate key per version, with effective_from/to dates.

You validate: a dbt test checks that every (employee_id, effective_from) pair is unique. Another test checks that is_current = TRUE rows have effective_to = '9999-12-31'.

KEY TAKEAWAYS

Overwriting destroys history: Type 1 is simple but retroactively changes historical reports

Type 2 preserves everything: new row per change, surrogate key per version, point-in-time joins

Type 3 is limited: only the previous value, only for before/after comparisons

Choose per attribute: Type 1 for corrections, Type 2 for reporting attributes, Type 3 for migrations

9999-12-31 sentinel: universal convention for 'this row is still current'

When yesterday's truth still matters

Category: Data Modeling
Duration: 17 minutes
Challenges: 12 hands-on challenges

Topics covered: The History Problem, Type 1: Overwrite Old Value, Type 2: Preserve Full History, Type 3: Keep Previous Value, Choosing Your Strategy

Lesson Sections

The History Problem (concepts: dmScdStrategy)
Why Overwriting Destroys Information When a dimension attribute changes, the simplest approach is to overwrite the old value with the new one. Customer moved to Seattle? UPDATE customers SET city = 'Seattle'. Done. Simple. But now every fact row that referenced this customer while they lived in Portland shows 'Seattle.' Last quarter's revenue by region report just retroactively changed. Portland lost revenue it actually had. Seattle gained revenue it did not earn. This is the slowly changing dim
Type 1: Overwrite Old Value (concepts: dmScdType1)
When History Does Not Matter Type 1 simply overwrites the old value with the new one. The customer moved to Seattle? UPDATE the city column. The old value (Portland) is gone forever. This is the right choice for attributes where nobody ever needs the historical value: typo corrections, phone number updates, name changes due to marriage. Type 1 is not lazy modeling. It is the correct choice when historical values have no analytical meaning. Applying Type 2 to a phone number creates dimension rows
Type 2: Preserve Full History (concepts: dmScdType2)
Keeping Every Version of a Dimension Row Type 2 creates a new row for every change. The old row is marked as expired. The new row is marked as current. Both rows have different surrogate keys. Fact rows from before the change point to the old SK. Fact rows after the change point to the new SK. Revenue is attributed to the correct city at the time of each transaction. Two rows, same customer_id, different surrogate keys. Fact rows from January 2024 join to SK 1001 (Portland). Fact rows from July
Type 3: Keep Previous Value (concepts: dmScdType3)
Limited History: Just the Last Two States Type 3 adds a 'previous' column alongside the current column. customer has city and previous_city. When the customer moves, the current city becomes the previous city, and the new city becomes current. Only one prior value is preserved. History beyond the immediately-previous state is lost. Type 3 is rarely used in practice. It is useful for exactly one scenario: before-and-after comparisons. 'Show me customers who changed region this quarter.' Beyond th
Choosing Your Strategy (concepts: dmScdStrategy)
Per-Attribute Strategy The most important concept: SCD type is chosen per attribute, not per table. A single dim_customer can have Type 1 on name (typo corrections do not need history), Type 2 on region (revenue attribution needs point-in-time accuracy), and Type 3 on segment (before/after comparison for a migration). The decision for each attribute: does anyone need the historical value? If the answer is 'yes, for reporting or audit,' use Type 2. If the answer is 'no, and keeping it creates use