Fact table interview questions isolated from the data modeling catalog. Grain selection (one row per X), additive vs semi-additive vs non-additive measures, transaction fact versus periodic snapshot versus accumulating snapshot, degenerate dimensions, and the surrogate-key-versus-natural-key decision on facts.

Fact tables are where the numeric measures of a data engineer's warehouse live. Each fact table has a declared grain (one row per X), FKs to dimensions that apply at that grain, and one or more numeric measures. Three fact table types appear in 2026 data engineer interviews: transaction fact, periodic snapshot fact, accumulating snapshot fact. Each fits a different analytical workload.

Transaction fact: one row per event at the moment it happens. One row per order line item, one row per impression, one row per click, one row per trip. The most common fact type, used for activity tracking and event aggregation. Measures are usually additive (revenue, quantity, duration). FKs point to dim_customer, dim_product, dim_date_time, dim_store. The grain is atomic and immutable: once a row is written, it does not change (corrections become new rows in a slowly-changing fact pattern).

Periodic snapshot fact: one row per entity per period, capturing state at the end of the period. One row per account per day with the daily balance, one row per inventory item per week with the weekly count. Measures are usually semi-additive: sum across customer or product (additive across entity), do not sum across time (you take the latest or average; summing a daily balance across 30 days does not produce a monthly balance). The benefit is fast period-over-period analysis without aggregating the transaction fact. The cost is data redundancy (the same daily-balance row repeats unchanged from yesterday if nothing happens).

Accumulating snapshot fact: one row per process instance, updated as the process moves through stages. One row per order with timestamps for order_placed, paid, shipped, delivered. The row starts with order_placed_ts populated and the other timestamps NULL; it updates as each milestone happens. Measures are durations (paid_ts minus placed_ts, shipped_ts minus paid_ts) and the row is mutable until the process ends. Useful for funnel analysis, lifecycle tracking, and SLA monitoring. Less common than transaction fact but distinct enough to be tested in data engineer modeling rounds at L5+.

Fact measure additivity is a recurring question. Additive measures sum across all dimensions (revenue, quantity, cost). Semi-additive measures sum across some dimensions but not others (account balance sums across customers but the daily balance does not sum across dates: you take the latest balance, not the sum of 30 daily balances). Non-additive measures (ratios, percentages, distinct counts) must be computed at the desired aggregation level (do not pre-compute a customer-day-level conversion rate and expect it to roll up to month: aggregate the raw counts, then compute the ratio). Identifying additivity per measure is part of the data engineer modeling rubric.

Degenerate dimensions on the fact: a natural key stored on the fact without a separate dim table because there are no attributes beyond the key itself. Order_number, invoice_number, transaction_id. Storing them on the fact preserves traceability back to the source system without the overhead of a single-column dim table. Senior data engineer modeling rounds explicitly ask for degenerate dimensions to be identified; junior rounds often skip this nuance.

Fact Table Interview Questions

Fact table design problems for data engineer interview prep.

57 practice problems matching this filter. Difficulty: medium (32), easy (8), hard (17).

Data Modeling (57)

Common questions

What is the grain of a fact table?
The grain is the unit of analysis: one row per X. For an orders fact, the grain is usually one row per order line item (not one row per order, which loses line-item detail). For an impressions fact, one row per impression. State the grain in one sentence before drawing the fact table. Mixed-grain fact tables are the failure mode interviewers explicitly test.
What is the difference between a transaction fact and a periodic snapshot fact?
Transaction fact: one row per event at the moment it happens (one row per order line item). Immutable once written. Measures usually additive. Periodic snapshot fact: one row per entity per period capturing state at end of period (one row per account per day with daily balance). Mutable in the sense that yesterday's row exists alongside today's. Measures usually semi-additive.
What is an accumulating snapshot fact?
One row per process instance, updated as the process moves through stages. One row per order with timestamps for order_placed, paid, shipped, delivered. Starts with order_placed_ts populated and the other timestamps NULL; updates as each milestone happens. Useful for funnel analysis, lifecycle tracking, SLA monitoring.
What is an additive measure?
A measure that can be summed across all dimensions. Revenue is additive (sums across customer, product, date, region). Quantity is additive. Cost is additive. Most transaction-fact measures are additive at the chosen grain.
What is a semi-additive measure?
A measure that sums across some dimensions but not others. Account balance is semi-additive: sums across customers (total balance across all customers), does not sum across dates (you take the latest balance, not the sum of 30 daily balances). Inventory level is semi-additive similarly. Periodic snapshot facts usually contain semi-additive measures.
What is a degenerate dimension?
A natural key stored on the fact without a separate dim table because there are no attributes beyond the key itself. Order_number, invoice_number, transaction_id. Storing them on the fact preserves traceability to the source system without the overhead of a single-column dim table. Senior data engineer modeling rounds explicitly ask for degenerate dimensions to be identified.
How does a fact table FK to an SCD Type 2 dimension?
The fact FKs to the surrogate key of the dim version that was current at the time of the fact event. When customer 42 has surrogate_key 1001 from 2025-01-01 to 2025-06-30 and surrogate_key 1002 from 2025-06-30 onward, an order on 2025-05-15 links to surrogate_key 1001. This is what enables point-in-time correctness: querying the order's customer attributes joins to the version current at the time of the order, not the current version.
Should a fact table have a primary key?
Usually yes, even though most fact tables do not enforce one. The primary key is typically a composite of all FKs plus a degenerate dimension (order_number) plus a sequence number for line items. Some warehouses skip the explicit PK for ingest performance and rely on dedup logic in the pipeline. Modern table formats (Iceberg, Delta) often track row identity via internal sequence numbers regardless of declared PK.