Dimensional Modeling: Guide for Data Engineering Interviews
Dimensional modeling is the Kimball methodology: split everything into facts and dimensions, state the grain first, denormalize dimensions for query speed, and connect facts across business processes with conformed dimensions. This guide covers all the concepts that appear in data engineering modeling rounds.
What Is Dimensional Modeling
Dimensional modeling is Ralph Kimball's methodology for designing analytical databases. It splits the world into facts and dimensions. Facts are measurable events: sales, clicks, sensor readings. Dimensions are descriptive context: customers, products, dates, stores.
The output is a star schema: one central fact table joined to a ring of denormalized dimension tables. The design prioritizes query speed and analyst comprehension over normalization purity.
Almost every modern data warehouse gold layer uses dimensional modeling. Understanding grain, fact types, SCD patterns, and conformed dimensions is table stakes for any data engineering interview.
Every problem comes from a real interview report. Run code in your browser.
Core Concepts
Fact Table
The measurable events in a business process. A fact table stores numeric measures at a stated grain. Every row is one business event: a sale, a click, a call. Columns are foreign keys to dimensions plus additive measures. Facts should be additive across all dimensions wherever possible. Interview tip: Always ask: can this measure be summed across all dimension combinations? If no, it may belong in a dimension or need separate fact tables at different grains.
Dimension Table
The descriptive context around each event. Dimensions describe the who, what, where, and when of a business event. They are denormalized by design. A customer dimension might contain name, email, segment, city, state, and country. Denormalization eliminates runtime joins and makes the model understandable to business users. Interview tip: Interviewers probe whether you know why dimensions are denormalized. The answer: analytical engines are optimized for scans, not joins. A single-table scan beats a multi-table join chain at analytical scale.
Grain
The level of detail represented by one row in a fact table. Grain is the most important decision in dimensional modeling. It must be stated explicitly before any other design work begins. 'One row per order line item' is a grain. 'One row per order' is a different grain. Mixing grains in one fact table causes incorrect aggregations. Interview tip: State the grain in your first sentence when asked to design a fact table. 'I would model this at the order line level, one row per item per order.' Then derive everything else from that.
Slowly Changing Dimension
A dimension attribute that changes over time and requires history tracking. Type 1 overwrites the old value. Type 2 adds a new row with effective dates and an is_current flag. Type 3 adds a previous-value column. Type 2 is the most common because historical facts must join to the dimension state that existed at the time of the event. Interview tip: SCD Type 2 is almost always the correct answer unless the interviewer specifies that history does not matter. The merge pattern (match on natural key, compare attributes, insert new row when changed) is worth memorizing.
Conformed Dimension
A dimension shared across multiple fact tables, with identical structure and values. A conformed date dimension used by fact_orders, fact_returns, and fact_inventory allows analysts to join those three facts on the same date key and compare across business processes. Without conformance, the same concept is defined differently in each fact, making cross-fact analysis impossible. Interview tip: Interviewers test conformance by asking: 'Your e-commerce company now has a marketplace. How do you extend your model to support marketplace sellers without breaking existing queries?' The answer involves conformed customer and date dimensions.
Fact Table Types
Transaction, periodic snapshot, and accumulating snapshot. Transaction facts record one row per discrete event. Periodic snapshot facts record one row per time period per entity (daily inventory balance). Accumulating snapshot facts record one row per business process lifecycle, updated as the process advances (one row per order updated as it moves from placed to shipped to delivered). Interview tip: Accumulating snapshots are the most complex and least understood. They have multiple date foreign keys (order_date_fk, ship_date_fk, delivery_date_fk). Interviewers at senior levels often probe this type specifically.
Fact Table Types
Transaction
One row per discrete business event. Example: fact_orders: one row per order placed. When to use: When you need to analyze individual events: order history, click logs, payment transactions. Most common fact type.
Periodic Snapshot
One row per entity per time period, capturing state at regular intervals. Example: fact_inventory_daily: one row per product per day with quantity_on_hand. When to use: When you need to analyze state over time: inventory levels, account balances, active subscriptions. Useful when the underlying transactions are too granular.
Accumulating Snapshot
One row per business process lifecycle, updated as the process progresses. Example: fact_order_fulfillment: one row per order with columns for order_date, ship_date, deliver_date, each updated as the order advances. When to use: When you need to analyze pipeline velocity: how long does an order take to ship? How many orders are stuck in each stage? Complex to implement but powerful for operations analytics.
Dimensional Modeling Design Steps
- 01
State the grain
Declare the level of detail for one row in the fact table before anything else. This single decision constrains every subsequent design choice.
- 02
Identify the dimensions
List the entities that provide context for events at that grain: date, customer, product, store, promotion. Each becomes a dimension table.
- 03
Identify the facts
List the numeric measures that describe what happened: revenue, quantity, duration, cost. Prefer additive measures (can be summed across all dimensions) over semi-additive or non-additive.
- 04
Design dimension attributes
For each dimension, list the descriptive attributes analysts need. Denormalize. Include both current values and any attributes needed for SCD Type 2 history tracking.
Dimensional Modeling Interview Questions
What is the first step in dimensional modeling?
State the grain. Every design decision follows from the grain. Trying to identify facts or dimensions before declaring the grain leads to mixed-granularity fact tables that produce incorrect aggregations.
What is the difference between additive, semi-additive, and non-additive facts?
Additive facts can be summed across all dimensions: revenue, quantity, duration. Semi-additive facts can be summed across some dimensions but not others: account balance can be summed across customers but not across time (summing daily balances gives a meaningless number). Non-additive facts cannot be summed at all: ratios, percentages, and averages.
What are conformed dimensions and why do they matter?
Conformed dimensions have identical structure and values across multiple fact tables. A conformed date dimension lets you join fact_orders and fact_returns on the same date key. Without conformance, each fact defines 'date' differently and cross-fact analysis requires custom logic or is impossible.
Kimball vs Inmon
| Dimension | Kimball (Dimensional) | Inmon (3NF) |
|---|---|---|
| Structure | Star schema, denormalized dims | Normalized, 3NF tables |
| Query performance | Fast (few joins) | Slower (many joins) |
| Update performance | Slower (update many rows) | Faster (update one row) |
| Business user access | Direct, understandable | Requires transformation layer |
| Primary use | Analytics, reporting | Operational data store |
Dimensional Modeling FAQ
What is dimensional modeling?+
What is the Kimball methodology?+
How does dimensional modeling differ from normalization?+
Practice Dimensional Modeling
Draw schemas on the canvas. The validator checks grain, fact type, and dimension design.