Dimensional Modeling: Guide for Data Engineering Interviews

Dimensional modeling is the Kimball methodology: split everything into facts and dimensions, state the grain first, denormalize dimensions for query speed, and connect facts across business processes with conformed dimensions. This guide covers all the concepts that appear in data engineering modeling rounds.

What Is Dimensional Modeling

Dimensional modeling is Ralph Kimball's methodology for designing analytical databases. It splits the world into facts and dimensions. Facts are measurable events: sales, clicks, sensor readings. Dimensions are descriptive context: customers, products, dates, stores.

The output is a star schema: one central fact table joined to a ring of denormalized dimension tables. The design prioritizes query speed and analyst comprehension over normalization purity.

Almost every modern data warehouse gold layer uses dimensional modeling. Understanding grain, fact types, SCD patterns, and conformed dimensions is table stakes for any data engineering interview.

Free forever
Practice Dimensional Modeling problems from real interviews.

Every problem comes from a real interview report. Run code in your browser.

Core Concepts

Fact Table

The measurable events in a business process. A fact table stores numeric measures at a stated grain. Every row is one business event: a sale, a click, a call. Columns are foreign keys to dimensions plus additive measures. Facts should be additive across all dimensions wherever possible. Interview tip: Always ask: can this measure be summed across all dimension combinations? If no, it may belong in a dimension or need separate fact tables at different grains.

Dimension Table

The descriptive context around each event. Dimensions describe the who, what, where, and when of a business event. They are denormalized by design. A customer dimension might contain name, email, segment, city, state, and country. Denormalization eliminates runtime joins and makes the model understandable to business users. Interview tip: Interviewers probe whether you know why dimensions are denormalized. The answer: analytical engines are optimized for scans, not joins. A single-table scan beats a multi-table join chain at analytical scale.

Grain

The level of detail represented by one row in a fact table. Grain is the most important decision in dimensional modeling. It must be stated explicitly before any other design work begins. 'One row per order line item' is a grain. 'One row per order' is a different grain. Mixing grains in one fact table causes incorrect aggregations. Interview tip: State the grain in your first sentence when asked to design a fact table. 'I would model this at the order line level, one row per item per order.' Then derive everything else from that.

Slowly Changing Dimension

A dimension attribute that changes over time and requires history tracking. Type 1 overwrites the old value. Type 2 adds a new row with effective dates and an is_current flag. Type 3 adds a previous-value column. Type 2 is the most common because historical facts must join to the dimension state that existed at the time of the event. Interview tip: SCD Type 2 is almost always the correct answer unless the interviewer specifies that history does not matter. The merge pattern (match on natural key, compare attributes, insert new row when changed) is worth memorizing.

Conformed Dimension

A dimension shared across multiple fact tables, with identical structure and values. A conformed date dimension used by fact_orders, fact_returns, and fact_inventory allows analysts to join those three facts on the same date key and compare across business processes. Without conformance, the same concept is defined differently in each fact, making cross-fact analysis impossible. Interview tip: Interviewers test conformance by asking: 'Your e-commerce company now has a marketplace. How do you extend your model to support marketplace sellers without breaking existing queries?' The answer involves conformed customer and date dimensions.

Fact Table Types

Transaction, periodic snapshot, and accumulating snapshot. Transaction facts record one row per discrete event. Periodic snapshot facts record one row per time period per entity (daily inventory balance). Accumulating snapshot facts record one row per business process lifecycle, updated as the process advances (one row per order updated as it moves from placed to shipped to delivered). Interview tip: Accumulating snapshots are the most complex and least understood. They have multiple date foreign keys (order_date_fk, ship_date_fk, delivery_date_fk). Interviewers at senior levels often probe this type specifically.

Fact Table Types

Transaction

One row per discrete business event. Example: fact_orders: one row per order placed. When to use: When you need to analyze individual events: order history, click logs, payment transactions. Most common fact type.

Periodic Snapshot

One row per entity per time period, capturing state at regular intervals. Example: fact_inventory_daily: one row per product per day with quantity_on_hand. When to use: When you need to analyze state over time: inventory levels, account balances, active subscriptions. Useful when the underlying transactions are too granular.

Accumulating Snapshot

One row per business process lifecycle, updated as the process progresses. Example: fact_order_fulfillment: one row per order with columns for order_date, ship_date, deliver_date, each updated as the order advances. When to use: When you need to analyze pipeline velocity: how long does an order take to ship? How many orders are stuck in each stage? Complex to implement but powerful for operations analytics.

Dimensional Modeling Design Steps

  1. 01

    State the grain

    Declare the level of detail for one row in the fact table before anything else. This single decision constrains every subsequent design choice.

  2. 02

    Identify the dimensions

    List the entities that provide context for events at that grain: date, customer, product, store, promotion. Each becomes a dimension table.

  3. 03

    Identify the facts

    List the numeric measures that describe what happened: revenue, quantity, duration, cost. Prefer additive measures (can be summed across all dimensions) over semi-additive or non-additive.

  4. 04

    Design dimension attributes

    For each dimension, list the descriptive attributes analysts need. Denormalize. Include both current values and any attributes needed for SCD Type 2 history tracking.

Dimensional Modeling Interview Questions

What is the first step in dimensional modeling?

State the grain. Every design decision follows from the grain. Trying to identify facts or dimensions before declaring the grain leads to mixed-granularity fact tables that produce incorrect aggregations.

What is the difference between additive, semi-additive, and non-additive facts?

Additive facts can be summed across all dimensions: revenue, quantity, duration. Semi-additive facts can be summed across some dimensions but not others: account balance can be summed across customers but not across time (summing daily balances gives a meaningless number). Non-additive facts cannot be summed at all: ratios, percentages, and averages.

What are conformed dimensions and why do they matter?

Conformed dimensions have identical structure and values across multiple fact tables. A conformed date dimension lets you join fact_orders and fact_returns on the same date key. Without conformance, each fact defines 'date' differently and cross-fact analysis requires custom logic or is impossible.

Kimball vs Inmon

DimensionKimball (Dimensional)Inmon (3NF)
StructureStar schema, denormalized dimsNormalized, 3NF tables
Query performanceFast (few joins)Slower (many joins)
Update performanceSlower (update many rows)Faster (update one row)
Business user accessDirect, understandableRequires transformation layer
Primary useAnalytics, reportingOperational data store

Dimensional Modeling FAQ

What is dimensional modeling?+
Dimensional modeling is Ralph Kimball's methodology for designing analytical data warehouses. It organizes data into facts (measurable events) and dimensions (descriptive context), optimizing for query speed and human comprehension over normalization purity. Almost every modern data warehouse gold layer uses dimensional modeling.
What is the Kimball methodology?+
The Kimball methodology defines a bottom-up approach to data warehousing: build data marts for individual business processes first, then integrate them using conformed dimensions. Each data mart uses a star schema with a central fact table and surrounding dimension tables. The bus architecture connects marts through shared conformed dimensions.
How does dimensional modeling differ from normalization?+
Normalization eliminates redundancy to optimize for write performance and data integrity. Dimensional modeling deliberately denormalizes to optimize for read performance. A normalized customer record might span three tables. A dimensional customer record puts all attributes in one flat table, accepting redundancy in exchange for faster analytical queries.

Practice Dimensional Modeling

Draw schemas on the canvas. The validator checks grain, fact type, and dimension design.

Related Data Modeling Guides