Slowly Changing Dimensions (2026): SCD Types 1-6 Explained

Slowly changing dimensions test whether you understand how data warehouses track historical changes. Interviewers want to see trade-off reasoning: when do you preserve history, when do you overwrite, and why does it matter for downstream analytics?

What Are Slowly Changing Dimensions?

A slowly changing dimension is a dimension table in a data warehouse where attribute values change over time, but not with every transaction. Customer addresses change when people move. Employee job titles change with promotions. Product categories change when the catalog is reorganized. These changes are 'slow' compared to fact data, which changes with every order, click, or event.

The challenge is deciding how to handle these changes in your warehouse. Do you overwrite the old value and lose history? Do you keep every version and accept the storage and query complexity? Do you track just the previous value? Each approach has a name (Type 1 through Type 6), specific implementation mechanics, and distinct trade-offs that interviewers expect you to articulate.

The concept comes from Ralph Kimball's dimensional modeling methodology and remains one of the most practical and frequently tested topics in data engineering interviews. Even teams using modern lakehouse patterns face the same fundamental question: when a dimension attribute changes, what do you do with the old value?

SCD Types Explained

Type 0: Retain Original (Never changes)

The attribute value is set at insert time and never updated, regardless of what happens in the source system. This is appropriate for attributes that represent a historical fact at a point in time, where the original value has permanent business meaning. Example: A customer's original_signup_date or original_acquisition_channel. Even if the customer later changes their behavior, the original value represents how they first arrived and should never be overwritten. Interview note: Rarely the focus of a question, but mentioning Type 0 when comparing SCD types signals that you understand the full spectrum. It shows you have thought about which attributes should be immutable.

Type 1: Overwrite (No history)

The old value is replaced with the new value in place. No record of the previous value is kept. The dimension table always reflects the current state of the entity. This is the simplest approach and the cheapest in storage, but it rewrites history: any fact records joined to this dimension will now reflect the new attribute value, even for events that occurred before the change. Example: A customer corrects a typo in their name. There is no analytical reason to preserve the misspelled version. Type 1 overwrites it, and all historical transactions now show the correct name. Another example: standardizing a phone number format from '(555) 123-4567' to '+15551234567'. The old format was never analytically meaningful. Interview note: Interviewers test whether you understand the trade-off, not just the mechanism. The right answer depends on whether the attribute has analytical significance in its historical state. If it does, Type 1 is wrong. If it does not, Type 2 is over-engineering.

Type 2: Add New Row (Full history)

When an attribute changes, the current row is closed (effective_to set to yesterday, is_current set to false) and a new row is inserted with the updated value (effective_from set to today, is_current set to true). Each version of the entity gets its own surrogate key. The natural key remains the same across all versions, linking them together. This is the most common SCD type in production data warehouses. Example: A customer moves from New York to Chicago on March 15. The existing row gets effective_to = '2026-03-14' and is_current = false. A new row is inserted with city = 'Chicago', effective_from = '2026-03-15', effective_to = '9999-12-31', is_current = true. A new surrogate key is assigned. Fact records from before the move still join to the New York version via the old surrogate key. Interview note: This is the SCD type interviewers ask about most. Be ready to explain: (1) why surrogate keys are necessary (natural keys are no longer unique), (2) how fact tables reference the correct version, (3) the effective_to = '9999-12-31' convention for the current row, and (4) how this affects query patterns (always filter on is_current for current-state queries).

Type 3: Add New Column (Limited history)

Instead of adding a new row, you add a new column to store the previous value. The table maintains exactly one level of history: current and previous. If the attribute changes again, the oldest value is lost. The table grows wider (more columns) but not taller (more rows). Example: A customer changes sales regions from 'West' to 'Central'. The table has columns current_region and previous_region. If they later move to 'East', current_region becomes 'East', previous_region becomes 'Central', and 'West' is lost. This works when the business only needs before/after comparison, not full history. Interview note: Less common in practice than Types 1 and 2. But knowing when Type 3 makes sense (simple before/after analysis, limited storage budget, stable attributes that change at most once) demonstrates that you select the right tool rather than always defaulting to Type 2.

Type 4: Mini-Dimension (History Table) (Separate history)

The main dimension table always contains only the current state. A separate history table stores all previous versions with timestamps. This keeps the primary dimension table compact and fast for current-state queries while preserving full history for audit or analysis. Queries that need historical context join to the history table explicitly. Example: The customer table has one row per customer with current attributes. A customer_history table has one row per change, with change_timestamp, changed_attribute, old_value, and new_value. Current-state dashboards query only the customer table. Compliance audits query the history table. Interview note: Type 4 is the most practical approach in many production systems, especially when the dimension is large and only a small percentage of queries need historical data. Mentioning it distinguishes you from candidates who only know Types 1, 2, and 3.

Type 6: Hybrid (1 + 2 + 3) (Combined approach)

Combines Types 1, 2, and 3 in a single table. A new row is added for each change (Type 2). The new row includes a previous_value column (Type 3). A current_value column on ALL rows for that entity is overwritten to the latest value (Type 1). This gives consumers three options: query the current value from any row (Type 1 column), compare current to previous (Type 3 columns), or traverse the full history (Type 2 row versions). Example: A customer moves from New York to Chicago. A new row is inserted (Type 2) with city = 'Chicago' and previous_city = 'New York' (Type 3). The current_city column on ALL rows for this customer, including the old New York row, is updated to 'Chicago' (Type 1). Any row for this customer now tells you both the historical value for that version and the current value. Interview note: Advanced topic. Most interviewers will be impressed if you can explain the rationale: Type 6 lets different queries use the same table without trade-offs. Do not lead with it. Bring it up if the interviewer asks about hybrid approaches or if you are explaining trade-offs between types.

Interview Questions with Guidance

Implement SCD Type 2 for a customer dimension table.

Define the table with: customer_sk (surrogate key, auto-increment), customer_id (natural key), name, city, state, effective_from (date), effective_to (date, default '9999-12-31'), is_current (boolean, default true). The load process: (1) Compare incoming data against current dimension rows (WHERE is_current = true) on the natural key. (2) For changed rows, UPDATE the existing current row to set effective_to = yesterday and is_current = false. (3) INSERT a new row with the updated attributes, effective_from = today, effective_to = '9999-12-31', is_current = true. (4) For new customers, INSERT directly with is_current = true.

When would you choose Type 1 over Type 2?

When the attribute has no analytical significance in its historical state. Three categories: (1) Data corrections, like fixing a misspelled name or standardizing a phone format. The old value was wrong, so preserving it adds no value. (2) Attributes that are operationally useful but not analytically tracked, like a customer's login email. (3) Attributes where the business explicitly does not want historical reporting, like GDPR-driven updates where the old value must be removed. The key signal to the interviewer: you make this decision based on business requirements, not engineering convenience.

How do you handle late-arriving dimension changes?

A late-arriving dimension change means you learn about an attribute change after fact records for that period have already been loaded. For Type 2: you need to insert a backdated version of the dimension row with the correct effective dates, then potentially re-point fact records that were loaded during the affected period. For Type 1: the late arrival is handled by the normal overwrite. The complexity is in Type 2, because you must split an existing version into two periods and update fact table foreign keys. This is operationally expensive, which is why some teams choose Type 1 for attributes that frequently arrive late.

Your Type 2 dimension has 500M rows and queries are slow. What do you do?

Five approaches, in order of impact: (1) Partition or cluster by is_current so current-state queries skip historical rows. This alone often solves the problem. (2) Create a view or materialized view filtered to is_current = true for downstream consumers. (3) Move to Type 4: separate current and historical data into two physical tables. (4) Archive very old versions beyond a retention window if the business does not need them. (5) Index the surrogate key and natural key combination for join performance. The interviewer wants to see that you diagnose the root cause (most queries only need current data) before jumping to solutions.

Design the ETL process that maintains a Type 2 dimension.

Step by step: (1) Stage incoming data in a temporary table. (2) LEFT JOIN staged data to the current dimension (WHERE is_current = true) on natural key. (3) Identify three categories: new rows (no match in dimension), unchanged rows (match with identical attributes), and changed rows (match with different attributes). (4) For changed rows, UPDATE the dimension to close the current version. (5) INSERT new versions for changed rows and entirely new rows. (6) Log metrics: rows inserted, rows updated, rows unchanged. The MERGE statement combines steps 3 through 5 into a single atomic operation in most warehouses.

What is the difference between a surrogate key and a natural key in the context of SCDs?

A natural key identifies the real-world entity (customer_id = 12345). A surrogate key identifies a specific version of that entity in the dimension table (customer_sk = 98765). In Type 2, one natural key maps to many surrogate keys (one per version). Fact tables must reference the surrogate key to join to the correct historical version. If fact tables reference the natural key, every fact row joins to every version of the dimension, producing incorrect results. This distinction is the most common source of bugs in Type 2 implementations.

How do SCDs interact with dbt snapshots?

dbt snapshots implement SCD Type 2 declaratively. You configure the source table, unique key, and change detection strategy (timestamp-based or check-based). dbt automatically manages surrogate keys, effective dates (dbt_valid_from, dbt_valid_to), and current flags. Mentioning dbt snapshots shows practical experience. But also note the limitations: dbt snapshots run at scheduled intervals, so they capture changes at snapshot time, not at the exact moment the source changed. For real-time SCD tracking, you need CDC-based approaches outside of dbt.

SQL: SCD Type 2 MERGE Pattern

-- SCD Type 2 MERGE pattern (Snowflake / BigQuery syntax)
MERGE INTO dim_customer AS target
USING staging_customer AS source
ON target.customer_id = source.customer_id
   AND target.is_current = TRUE

-- Close the existing version when attributes have changed
WHEN MATCHED
  AND (target.city <> source.city OR target.state <> source.state)
THEN UPDATE SET
  target.effective_to = CURRENT_DATE - 1,
  target.is_current = FALSE

-- Insert new customers that do not exist yet
WHEN NOT MATCHED
THEN INSERT (customer_id, name, city, state,
             effective_from, effective_to, is_current)
VALUES (source.customer_id, source.name, source.city,
        source.state, CURRENT_DATE, '9999-12-31', TRUE);

-- Second pass: insert new versions for changed rows
INSERT INTO dim_customer
  (customer_id, name, city, state,
   effective_from, effective_to, is_current)
SELECT s.customer_id, s.name, s.city, s.state,
       CURRENT_DATE, '9999-12-31', TRUE
FROM staging_customer s
JOIN dim_customer d
  ON s.customer_id = d.customer_id
  AND d.effective_to = CURRENT_DATE - 1
  AND d.is_current = FALSE;

The two-pass approach works across all major warehouses. Some support multi-action MERGE that combines the passes into one statement.

Common Mistakes Interviewers Catch

Joining fact tables to Type 2 dimensions on the natural key

If your fact table joins to dim_customer on customer_id, every fact row matches every version of that customer. Revenue gets multiplied by the number of address changes. You must join on the surrogate key (customer_sk) to get the correct version. This is the single most common SCD implementation bug.

Forgetting to set effective dates on the initial load

The first time you populate a Type 2 dimension, every row needs effective_from set to the earliest relevant date (often the warehouse start date), effective_to set to '9999-12-31', and is_current set to true. If you skip this, you have NULL date ranges that break every downstream query using BETWEEN logic on effective dates.

Applying Type 2 to every attribute indiscriminately

Not every attribute change deserves a new version. If a customer updates their phone number, do you need a new row? Probably not. Track only the attributes that have analytical significance for historical reporting. Use Type 1 for the rest. Otherwise, your dimension table grows explosively and most of the versions add no analytical value.

Not indexing is_current or effective date columns

Without an index or partition on is_current, every current-state query scans the full table, including millions of historical rows. In a table with 500M total rows and 10M current rows, this turns a sub-second query into a 30-second scan. Partition by is_current or create a filtered index.

Frequently Asked Questions

What is a slowly changing dimension?+

A slowly changing dimension (SCD) is a dimension table in a data warehouse where attribute values change over time, but infrequently. The classic example is a customer's address: it might change once every few years, compared to transactional data that changes with every order. The 'slowly' part distinguishes these from rapidly changing attributes (like stock prices or sensor readings) that require different handling patterns such as mini-dimensions or real-time streaming.

Which SCD type is most common in production?+

SCD Type 2 is the most widely implemented in production data warehouses. It preserves full history by adding new rows with effective date ranges. Many production systems use a hybrid approach: Type 2 for analytically significant attributes (address, job title, account tier) and Type 1 for attributes where history is irrelevant (phone format, typo corrections). This selective application of SCD types is what experienced engineers do in practice.

Do data engineering interviews always ask about SCDs?+

SCDs appear in roughly a third of data modeling interview segments. They are a staple at companies that use data warehouses, especially those with Kimball-style dimensional models. If the job description mentions data warehousing, dimensional modeling, or analytics engineering, expect SCD questions. Even at companies using modern lakehouse patterns, the concept of tracking historical changes in dimension tables remains relevant.

How do SCDs work with dbt?+

dbt has a built-in snapshot feature that implements SCD Type 2 automatically. You define the source table, the unique key, and the change detection strategy (timestamp or check). dbt manages the effective date columns (dbt_valid_from, dbt_valid_to) and tracks whether each row is current. Mentioning dbt snapshots in an interview shows practical experience. But also understand the limitation: dbt snapshots capture state at scheduled run times, not at the exact moment of change.

What is the '9999-12-31' convention in SCD Type 2?+

The far-future date '9999-12-31' is used as the effective_to value for the current version of a dimension row. This convention makes date range queries simpler: you can write WHERE event_date BETWEEN effective_from AND effective_to without special-casing NULL values for the current row. Some teams use NULL instead, but the far-future date is more common because it avoids the need for COALESCE logic in every join.

02 / Why practice

Practice SCD Questions in SQL

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
The round is won on tradeoffs, not on the diagram
Grain, star vs snowflake, SCD type, conformed dimensions, late-arriving data. Modeling under live pushback is what separates the bands, and it is the half almost nobody rehearses

Open the problems

Related Guides

Data Modeling Interview Questions→

Full guide to data modeling topics interviewers test

Dimensional Modeling→

Kimball methodology, facts vs dimensions, and the bus matrix

Medallion Architecture→

Bronze, silver, and gold layers for progressive data refinement