SCD Type 2 is a dimensional modeling strategy that inserts a new row every time a tracked attribute changes, preserving full history. Each row carries a unique surrogate key plus valid_from, valid_to, and is_current columns.

How do you query a Type 2 dimension as of a specific date?

Use the temporal range predicate: WHERE >= valid_from AND ( < valid_to OR valid_to IS NULL). This returns the version that was current on the given date.

Should fact tables reference the natural key or surrogate key?

The surrogate key, captured at fact-load time. This locks the fact to a specific dimension version and makes joins a simple equality rather than a range predicate.

How does dbt implement SCD Type 2?

dbt snapshots. Define a snapshot with unique_key and a strategy (timestamp or check). dbt runs the snapshot periodically and maintains dbt_valid_from and dbt_valid_to automatically.

When should I use Type 2 vs Type 1?

Type 2 when business cares about history, plan tier changes, pricing updates, territory reassignments. Type 1 when the change is a data-quality fix and history is noise. Default to Type 2 for business attributes.

Data Modeling · Interview concept

SCD Type 2: Implementation and Point-in-Time Queries

Q: Should fact tables reference the natural key or surrogate key?

The surrogate key, captured at fact-load time. This locks the fact to a specific dimension version and makes joins a simple equality rather than a range predicate.

Q: How does dbt implement SCD Type 2?

dbt snapshots. Define a snapshot with unique_key and a strategy (timestamp or check). dbt runs the snapshot periodically and maintains dbt_valid_from and dbt_valid_to automatically.

Q: When should I use Type 2 vs Type 1?

Type 2 when business cares about history, plan tier changes, pricing updates, territory reassignments. Type 1 when the change is a data-quality fix and history is noise. Default to Type 2 for business attributes.

SCD Type 2 keeps a new row for every version of a dimension attribute, enabling point-in-time accuracy. It is the default senior-DE expectation and the most-asked data modeling interview topic.

Try itDesign the Type 2 dimension: drag columns, draw the FK

drag • connect • edit

fact_order should reference dim_customer on customer_key (surrogate), not customer_id (natural). Draw the FK connection on the correct column.

Loading canvas...

Overview

What SCD Type 2 is and why it matters

Type 2 inserts a new row every time a tracked attribute changes. The old row is closed by setting valid_to to the change timestamp and is_current to 0. The new row carries a fresh surrogate key, the updated values, valid_from set to the change timestamp, valid_to NULL, and is_current 1.

The payoff: every historical state of every dimension member is queryable. Facts join to the version that was current when the fact occurred, not the version current today.

Syntax

SCD Type 2 reference

Token	Meaning
customer_key	Surrogate key. Warehouse-generated, unique per version. Fact tables reference this.
customer_id	Natural key. Stable across versions. Used for lookups and upstream identification.
valid_from / valid_to	Half-open interval [from, to). valid_to NULL means currently valid.
is_current	Convenience flag. 1 on the current row, 0 on historical rows. Redundant with valid_to IS NULL but speeds up current-state queries.

Prepare for the interview

01 / Open invite

02min.

Know SCD Type 2 the way the interviewer who asks it knows it.

a SCD Type 2 query, the same shape a screen would give you.

The diff against expected. Where ties broke. What you missed.

sandbox

1fact_orders

2 order_id bigint PK

3 customer_sk bigint FK

4 order_date date SCD2

Execute your solution0.4s avg.

DatabricksInterview question

Solve a SCD Type 2 problem

Patterns

How SCD Type 2 shows up in practice

Close-and-insert pattern

The two-statement Type 2 update. Run in a transaction so the dimension never has two current rows for the same natural key.

	BEGIN ; UPDATE dim_customer SET valid_to = '2025-06-01', is_current = 0


	WHERE customer_id = 101 AND is_current = 1 ; INSERT INTO dim_customer(customer_key, customer_id, name, plan_tier, country, valid_from, valid_to, is_current) VALUES(2, 101, 'Alice', 'pro', 'US', '2025-06-01', NULL, 1) ; COMMIT ;

Interview noteInterviewers watch for the transaction. Without it, a crash between statements leaves the dimension in an inconsistent state with zero current rows.

MERGE-based Type 2 (Snowflake/BigQuery/Databricks)

Production Type 2 is usually implemented via MERGE. Snowflake supports a single-statement form; other engines need a close + insert split.

	MERGE INTO dim_customer tgt
	USING stg_customer src
	ON tgt.customer_id = src.customer_id AND tgt.is_current = 1 WHEN MATCHED AND(tgt.name <> src.name OR tgt.plan_tier <> src.plan_tier OR tgt.country <> src.country) THEN UPDATE SET valid_to = CURRENT_TIMESTAMP, is_current = 0 ; INSERT INTO dim_customer(...)

	SELECT
	...
	FROM stg_customer s
	LEFT JOIN dim_customer d
	ON d.customer_id = s.customer_id AND d.is_current = 1
	WHERE d.customer_id IS NULL OR d.name <> s.name OR d.plan_tier <> s.plan_tier OR d.country <> s.country ;

Point-in-time fact-to-dim join

The reason Type 2 exists. Join each fact to the dimension version that was valid when the fact happened.

	/* fact_order modeled by transactions (the fact), dim_customer by a */
	/* Type 2 view over users (account_status as the tracked attribute). */
	WITH dim_customer AS (
	SELECT
	user_id AS customer_id,
	account_status AS plan_tier,
	signup_date AS valid_from,
	NULL AS valid_to
	FROM users
	)

	SELECT
	d.plan_tier,
	SUM(f.total_amount) AS revenue
	FROM transactions AS f
	INNER JOIN dim_customer AS d
	ON f.user_id = d.customer_id
	AND f.transaction_date >= d.valid_from
	AND (
	f.transaction_date < d.valid_to
	OR d.valid_to IS NULL
	)
	GROUP BY d.plan_tier

Interview noteBest practice: capture the surrogate key at fact-load time so the fact carries customer_key directly. This avoids the range predicate and is far faster at scale.

Current-state query (the common case)

Most analytics queries just want the current state. The is_current flag makes this a simple equality predicate.

	/* dim_customer modeled as a Type 2 view over users; the current */
	/* version is the row whose valid_to is still open (is_current = 1). */
	WITH dim_customer AS (
	SELECT
	user_id AS customer_id,
	username AS name,
	account_status AS plan_tier,
	age_bucket AS country,
	1 AS is_current
	FROM users
	)

	SELECT
	customer_id,
	name,
	plan_tier,
	country
	FROM dim_customer
	WHERE is_current = 1

dbt snapshot implementation

dbt's snapshot feature is the standard Type 2 implementation in modern warehouses. dbt manages the close-and-insert logic automatically.

	-- snapshots/dim_customer.sql
	{% snapshot dim_customer %}
	{{
	config(
	target_schema='snapshots',
	unique_key='customer_id',
	strategy='check',
	check_cols=['name', 'plan_tier', 'country']
	)
	}}
	SELECT * FROM {{ source('crm', 'customers') }}
	{% endsnapshot %}

Interview notedbt uses dbt_valid_from / dbt_valid_to columns. Know the naming convention; interviewers will test whether you shipped dbt in production or only read about it.

Interview questions

SCD Type 2 interview questions

Q1.Walk me through how you would implement SCD Type 2 for a customer dimension in SQL.

What they testLive implementation. Watch for the transaction, the close-old-row step, and surrogate key generation.

ApproachStart with the schema: customer_key (surrogate, PK), customer_id (natural), business attributes, valid_from, valid_to, is_current. On a change: BEGIN transaction, UPDATE current row to set valid_to = now and is_current = 0, INSERT new row with next surrogate key and valid_to NULL. COMMIT.

Q2.Given a fact table and a Type 2 dimension, write the join so each fact picks up the dimension version valid when the fact happened.

What they testThe point-in-time join is the single most common SCD question. Candidates who stumble here have not shipped a warehouse.

ApproachJOIN ON natural_key AND fact.ts >= dim.valid_from AND (fact.ts < dim.valid_to OR dim.valid_to IS NULL). Alternatively, if the fact-load process captured the surrogate key at write time, JOIN ON customer_key directly, faster and cleaner.

Q3.What's the trade-off between tracking an attribute as Type 2 vs Type 1?

What they testDesign judgment. Type 2 is not free, it multiplies row count. Know when history matters and when it doesn't.

ApproachType 2 preserves history at the cost of row count and query complexity. Use it for business-meaningful attributes (plan tier, pricing, territory). Type 1 is fine for data-quality fixes (typo in a name, corrected address). Some dimensions use Type 6. Type 2 rows plus a Type 1 current-value column.

Q4.Your Type 2 dimension keeps growing to 500M rows for 10M customers. What went wrong?

What they testProduction debugging. Usually the cause is a high-churn attribute tracked as Type 2 when it should be on the fact.

ApproachAudit which attributes are changing most often. If last_login_at is tracked as Type 2, every login creates a new dim row. Move it to fact_session or drop it from the dimension entirely. Reserve Type 2 for attributes that change rarely and where history has business value.

Q5.How do you detect a data bug where a Type 2 dimension has two current rows for the same natural key?

What they testData quality thinking. The invariant is exactly one is_current=1 per natural key, always.

ApproachQuery: SELECT natural_key, COUNT(*) FROM dim WHERE is_current = 1 GROUP BY natural_key HAVING COUNT(*) > 1. Enforce it with a unique constraint on (natural_key) WHERE is_current = 1 (partial index in Postgres). Run as a dbt test on every dbt run.

Q6.Why does the fact table reference customer_key (surrogate) instead of customer_id (natural)?

What they testThe single most important Type 2 invariant. Fact-to-surrogate references lock facts to specific dimension versions.

ApproachA fact created on 2025-03-01 should reference the dimension version valid on 2025-03-01 forever, even after the dimension updates. Storing customer_key captures that binding at write time. Storing customer_id would cause the fact's joined attributes to silently change whenever the dimension updates.

Common mistakes

What breaks in practice

Two current rows for the same natural key

Non-transactional updates or a bug in the close-old-row step can leave two is_current=1 rows. Every downstream current-state query returns duplicates.

Wrong

INSERT INTO dim_customer(...) VALUES(...,  valid_from = now,  valid_to = NULL,  is_current = 1) ;

Right

	BEGIN ; UPDATE dim_customer SET valid_to = now, is_current = 0
	WHERE customer_id = ? AND is_current = 1 ; INSERT INTO dim_customer(...) VALUES(...) ; COMMIT ;

Joining facts to the dimension on natural key without a time predicate

Without the valid_from/valid_to predicate, each fact joins to every historical version of the dimension, row explosion.

Wrong

	/* Fan-out: 1 fact row × 3 dim versions = 3 rows */
	SELECT
	*
	FROM fact_order AS f
	INNER JOIN dim_customer AS d
	ON f.customer_id = d.customer_id

Right

	SELECT
	*
	FROM fact_order AS f
	INNER JOIN dim_customer AS d
	ON f.customer_id = d.customer_id
	AND f.order_ts >= d.valid_from
	AND (
	f.order_ts < d.valid_to
	OR d.valid_to IS NULL
	)

Tracking high-churn attributes as Type 2

Attributes like last_login_at, session_count, or updated_at change on every event. Tracking them as Type 2 multiplies dimension row count into the billions.

Wrong

	'CREATE TABLE dim_customer ( . . ., last_login_at TIMESTAMP, . . . )'
	/* Every login creates a new dim row */
	/* tracked as Type 2 */

Right

	'CREATE TABLE fact_session ( customer_key INT, session_start TIMESTAMP, . . . )'
	/* Move to a fact table */
	/* Dimension stays lean */

Prepare for the interview

03 / From the bank03 of many

03hand-picked.

Online Marketplace - Seller Payouts

Hard30 min

The buyer paid one number. The seller got a different one.

Open the sandbox

Pick your level

AirbnbL3A Number for the Seller UberL4Marketplace Sales Warehouse StripeL5Two Wallets

Pulled from debriefs where modeling rounds went sideways.

FAQ

Common questions

What is SCD Type 2?: SCD Type 2 is a dimensional modeling strategy that inserts a new row every time a tracked attribute changes, preserving full history. Each row carries a unique surrogate key plus valid_from, valid_to, and is_current columns.
How do you query a Type 2 dimension as of a specific date?: Use the temporal range predicate: WHERE <date> >= valid_from AND (<date> < valid_to OR valid_to IS NULL). This returns the version that was current on the given date.
Should fact tables reference the natural key or surrogate key?: The surrogate key, captured at fact-load time. This locks the fact to a specific dimension version and makes joins a simple equality rather than a range predicate.
How does dbt implement SCD Type 2?: dbt snapshots. Define a snapshot with unique_key and a strategy (timestamp or check). dbt runs the snapshot periodically and maintains dbt_valid_from and dbt_valid_to automatically.
When should I use Type 2 vs Type 1?: Type 2 when business cares about history, plan tier changes, pricing updates, territory reassignments. Type 1 when the change is a data-quality fix and history is noise. Default to Type 2 for business attributes.

Interview Prep

Data Engineer Interview Prep Guide→

50+ guides covering every round, company, and role

Interview Rounds

SQL Round Python Round System Design Data Modeling Behavioral Live Coding

By Company

Stripe Airbnb Netflix Uber Databricks Snowflake

By Role

Senior DE Staff DE Analytics Engineer ML Data Engineer Junior DE

Question Sets

Top 100 Questions FAANG Questions SQL Questions 50 Questions Take-Home Examples

Published 2026-04-20Updated 2026-04-20

What SCD Type 2 is and why it matters

SCD Type 2 reference

Know SCD Type 2 the way the interviewer who asks it knows it.

How SCD Type 2 shows up in practice

Close-and-insert pattern

MERGE-based Type 2 (Snowflake/BigQuery/Databricks)

Point-in-time fact-to-dim join

Current-state query (the common case)

dbt snapshot implementation

SCD Type 2 interview questions

Q1.Walk me through how you would implement SCD Type 2 for a customer dimension in SQL.

Q2.Given a fact table and a Type 2 dimension, write the join so each fact picks up the dimension version valid when the fact happened.

Q3.What's the trade-off between tracking an attribute as Type 2 vs Type 1?

Q4.Your Type 2 dimension keeps growing to 500M rows for 10M customers. What went wrong?

Q5.How do you detect a data bug where a Type 2 dimension has two current rows for the same natural key?

Q6.Why does the fact table reference customer_key (surrogate) instead of customer_id (natural)?

What breaks in practice

Two current rows for the same natural key

Joining facts to the dimension on natural key without a time predicate

Tracking high-churn attributes as Type 2

Related concepts

Online Marketplace - Seller Payouts

Common questions