SCD Type 2: Implementation and Point-in-Time Queries
SCD Type 2 keeps a new row for every version of a dimension attribute, enabling point-in-time accuracy. It is the default senior-DE expectation and the most-asked data modeling interview topic.
fact_order should reference dim_customer on customer_key (surrogate), not customer_id (natural). Draw the FK connection on the correct column.
Overview
What SCD Type 2 is and why it matters
Type 2 inserts a new row every time a tracked attribute changes. The old row is closed by setting valid_to to the change timestamp and is_current to 0. The new row carries a fresh surrogate key, the updated values, valid_from set to the change timestamp, valid_to NULL, and is_current 1.
The payoff: every historical state of every dimension member is queryable. Facts join to the version that was current when the fact occurred, not the version current today.
Syntax
SCD Type 2 reference
| Token | Meaning |
|---|---|
| customer_key | Surrogate key. Warehouse-generated, unique per version. Fact tables reference this. |
| customer_id | Natural key. Stable across versions. Used for lookups and upstream identification. |
| valid_from / valid_to | Half-open interval [from, to). valid_to NULL means currently valid. |
| is_current | Convenience flag. 1 on the current row, 0 on historical rows. Redundant with valid_to IS NULL but speeds up current-state queries. |
Patterns
How SCD Type 2 shows up in practice
Close-and-insert pattern
The two-statement Type 2 update. Run in a transaction so the dimension never has two current rows for the same natural key.
Interview noteInterviewers watch for the transaction. Without it, a crash between statements leaves the dimension in an inconsistent state with zero current rows.
MERGE-based Type 2 (Snowflake/BigQuery/Databricks)
Production Type 2 is usually implemented via MERGE. Snowflake supports a single-statement form; other engines need a close + insert split.
Point-in-time fact-to-dim join
The reason Type 2 exists. Join each fact to the dimension version that was valid when the fact happened.
Interview noteBest practice: capture the surrogate key at fact-load time so the fact carries customer_key directly. This avoids the range predicate and is far faster at scale.
Current-state query (the common case)
Most analytics queries just want the current state. The is_current flag makes this a simple equality predicate.
dbt snapshot implementation
dbt's snapshot feature is the standard Type 2 implementation in modern warehouses. dbt manages the close-and-insert logic automatically.
Interview notedbt uses dbt_valid_from / dbt_valid_to columns. Know the naming convention, interviewers will test whether you actually shipped dbt or just read about it.
Interview questions
SCD Type 2 interview questions
Q1.Walk me through how you would implement SCD Type 2 for a customer dimension in SQL.
What they testLive implementation. Watch for the transaction, the close-old-row step, and surrogate key generation.
ApproachStart with the schema: customer_key (surrogate, PK), customer_id (natural), business attributes, valid_from, valid_to, is_current. On a change: BEGIN transaction, UPDATE current row to set valid_to = now and is_current = 0, INSERT new row with next surrogate key and valid_to NULL. COMMIT.
Q2.Given a fact table and a Type 2 dimension, write the join so each fact picks up the dimension version valid when the fact happened.
What they testThe point-in-time join is the single most common SCD question. Candidates who stumble here have not shipped a warehouse.
ApproachJOIN ON natural_key AND fact.ts >= dim.valid_from AND (fact.ts < dim.valid_to OR dim.valid_to IS NULL). Alternatively, if the fact-load process captured the surrogate key at write time, JOIN ON customer_key directly, faster and cleaner.
Q3.What's the trade-off between tracking an attribute as Type 2 vs Type 1?
What they testDesign judgment. Type 2 is not free, it multiplies row count. Know when history matters and when it doesn't.
ApproachType 2 preserves history at the cost of row count and query complexity. Use it for business-meaningful attributes (plan tier, pricing, territory). Type 1 is fine for data-quality fixes (typo in a name, corrected address). Some dimensions use Type 6. Type 2 rows plus a Type 1 current-value column.
Q4.Your Type 2 dimension keeps growing to 500M rows for 10M customers. What went wrong?
What they testProduction debugging. Usually the cause is a high-churn attribute tracked as Type 2 when it should be on the fact.
ApproachAudit which attributes are changing most often. If last_login_at is tracked as Type 2, every login creates a new dim row. Move it to fact_session or drop it from the dimension entirely. Reserve Type 2 for attributes that change rarely and where history has business value.
Q5.How do you detect a data bug where a Type 2 dimension has two current rows for the same natural key?
What they testData quality thinking. The invariant is exactly one is_current=1 per natural key, always.
ApproachQuery: SELECT natural_key, COUNT(*) FROM dim WHERE is_current = 1 GROUP BY natural_key HAVING COUNT(*) > 1. Enforce it with a unique constraint on (natural_key) WHERE is_current = 1 (partial index in Postgres). Run as a dbt test on every dbt run.
Q6.Why does the fact table reference customer_key (surrogate) instead of customer_id (natural)?
What they testThe single most important Type 2 invariant. Fact-to-surrogate references lock facts to specific dimension versions.
ApproachA fact created on 2025-03-01 should reference the dimension version valid on 2025-03-01 forever, even after the dimension updates. Storing customer_key captures that binding at write time. Storing customer_id would cause the fact's joined attributes to silently change whenever the dimension updates.
Common mistakes
What breaks in practice
Two current rows for the same natural key
Non-transactional updates or a bug in the close-old-row step can leave two is_current=1 rows. Every downstream current-state query returns duplicates.
Wrong
Right
Joining facts to the dimension on natural key without a time predicate
Without the valid_from/valid_to predicate, each fact joins to every historical version of the dimension, row explosion.
Wrong
Right
Tracking high-churn attributes as Type 2
Attributes like last_login_at, session_count, or updated_at change on every event. Tracking them as Type 2 multiplies dimension row count into the billions.
Wrong
Right
FAQ
Common questions
- What is SCD Type 2?
- SCD Type 2 is a dimensional modeling strategy that inserts a new row every time a tracked attribute changes, preserving full history. Each row carries a unique surrogate key plus valid_from, valid_to, and is_current columns.
- How do you query a Type 2 dimension as of a specific date?
- Use the temporal range predicate: WHERE <date> >= valid_from AND (<date> < valid_to OR valid_to IS NULL). This returns the version that was current on the given date.
- Should fact tables reference the natural key or surrogate key?
- The surrogate key, captured at fact-load time. This locks the fact to a specific dimension version and makes joins a simple equality rather than a range predicate.
- How does dbt implement SCD Type 2?
- dbt snapshots. Define a snapshot with unique_key and a strategy (timestamp or check). dbt runs the snapshot periodically and maintains dbt_valid_from and dbt_valid_to automatically.
- When should I use Type 2 vs Type 1?
- Type 2 when business cares about history, plan tier changes, pricing updates, territory reassignments. Type 1 when the change is a data-quality fix and history is noise. Default to Type 2 for business attributes.
Continue your prep
Data Engineer Interview Prep, explore the full guide
50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.
Interview Rounds
By Company
- Stripe Data Engineer Interview
- Airbnb Data Engineer Interview
- Uber Data Engineer Interview
- Netflix Data Engineer Interview
- Databricks Data Engineer Interview
- Snowflake Data Engineer Interview
- Lyft Data Engineer Interview
- DoorDash Data Engineer Interview
- Instacart Data Engineer Interview
- Robinhood Data Engineer Interview
- Pinterest Data Engineer Interview
- Twitter/X Data Engineer Interview
By Role
- Senior Data Engineer Interview
- Staff Data Engineer Interview
- Principal Data Engineer Interview
- Junior Data Engineer Interview
- Entry-Level Data Engineer Interview
- Analytics Engineer Interview
- ML Data Engineer Interview
- Streaming Data Engineer Interview
- GCP Data Engineer Interview
- AWS Data Engineer Interview
- Azure Data Engineer Interview