Data Modeling

Dimensional Modeling for Data Engineering Interviews

Kimball dimensional modeling is the foundation of data warehouse design and the most frequently tested modeling methodology in DE interviews. Interviewers want you to define the grain, separate facts from dimensions, classify measures, and defend your trade-offs.

Data modeling questions appear in roughly a third of DE interview loops. Dimensional modeling is the single most common sub-topic. If the job involves a data warehouse, expect these questions.

Core Concepts

Every dimensional modeling interview comes back to these building blocks. Know them cold, with examples and trade-offs ready.

Fact Tables

Fact tables store measurable events: transactions, clicks, shipments, payments. Each row represents one event at a specific grain. Columns are either foreign keys pointing to dimension tables or numeric measures (revenue, quantity, duration). Fact tables are tall and narrow. They grow continuously as new events occur.

Measures fall into three categories. Additive measures (revenue, quantity) can be summed across any dimension. Semi-additive measures (account balance) can be summed across some dimensions but not time. Non-additive measures (unit price, ratio) cannot be meaningfully summed and must be averaged or weighted. Knowing these categories matters in interviews because the interviewer may ask you to classify a measure and explain how to aggregate it correctly.

Interview tip

When asked to design a fact table, always state the grain first. 'One row per order line item' or 'one row per daily user session.' If you skip the grain, the interviewer will push you on it. The grain is the single most important decision in dimensional modeling.

Dimension Tables

Dimension tables store descriptive context: who, what, where, when. A customer dimension has name, city, signup_date, acquisition_channel. A product dimension has name, category, brand, price_tier. Dimension tables are short and wide. They change slowly, which is why SCD types exist.

Good dimension tables are richly attributed. A thin dimension with only an ID and a name forces analysts to do filtering and grouping in the fact query, which is slower and harder to read. A rich dimension with 20 to 30 descriptive columns lets analysts slice data by any attribute without modifying the fact query. In interviews, err on the side of putting more attributes into the dimension rather than fewer.

Interview tip

Interviewers often ask you to classify an attribute as fact or dimension. The rule: if it is a measurement that you would aggregate (sum, average, count), it belongs in the fact table. If it is a descriptor that you would filter or group by, it belongs in a dimension. Price is tricky: unit_price at the time of sale is a fact; standard_list_price is a dimension attribute.

Grain

Grain is the level of detail in a fact table. 'One row per order' is a different grain than 'one row per order line item.' Choosing the wrong grain means you either lose detail you need later or create bloated tables with redundant data. The grain dictates which dimensions can meaningfully attach to the fact table and which measures make sense.

A common mistake is mixing grains in a single fact table. If some rows represent individual transactions and other rows represent daily summaries, the table is unusable: you cannot sum the measures without double-counting. Every row in a fact table must represent exactly the same thing at exactly the same level of detail.

Interview tip

This is the single most important concept in dimensional modeling interviews. State it early and clearly: 'One row represents X.' If you can articulate the grain, the rest of the design falls into place. If you cannot, the interviewer knows you are guessing. Practice stating the grain for different business scenarios: retail, SaaS, logistics, advertising.

Bus Matrix

A bus matrix maps business processes (rows) to dimensions (columns). An X in a cell means that dimension applies to that business process. The bus matrix is the architectural plan for the entire warehouse. It forces alignment so that when both the sales fact and the returns fact reference the customer dimension, they use the same definition of 'customer.'

Without a bus matrix, each team designs its own dimensions independently. Marketing defines 'active customer' as someone who logged in this month. Sales defines it as someone who purchased this quarter. The bus matrix forces alignment: there is one customer dimension, with one definition, shared across all fact tables. Inconsistent definitions across teams is one of the most expensive problems in data warehousing.

Interview tip

Mentioning the bus matrix in an interview shows you understand enterprise-scale dimensional modeling, not just single-table design. Even if you are not asked about it directly, referencing it when discussing conformed dimensions demonstrates strategic thinking about the warehouse as a whole.

Conformed Dimensions

A conformed dimension is shared identically across multiple fact tables. The date dimension is used by every fact table. The customer dimension is used by sales, support, and marketing facts. Without conformed dimensions, you get inconsistent metrics: 'active customer' means one thing in one report and another thing in a different report.

Conformed dimensions are the mechanism that makes a data warehouse more valuable than a collection of independent data marts. They enable cross-process analysis: 'show me the customers who bought product X and then called support within 30 days.' This query only works if the sales fact and the support fact share the same customer dimension with the same surrogate keys.

Interview tip

If the interviewer asks about data consistency across teams or departments, conformed dimensions are the answer. This shows you think about the warehouse as a platform, not just a collection of tables. It is a Kimball concept that separates experienced engineers from those who have only built single-team solutions.

Degenerate Dimensions

A degenerate dimension is a dimension key that lives directly in the fact table without its own dimension table. The classic example is order_number: it is a descriptor (not a measure), but creating a separate table for it adds no value because the only attribute would be the order number itself.

Other examples include invoice_number, transaction_id, and confirmation_code. These values are used for grouping (all line items in the same order) or drill-down (look up a specific transaction), but they carry no additional attributes that would justify a separate dimension table.

Interview tip

Interviewers use degenerate dimensions to test whether you blindly follow rules or think critically. If you can explain why some dimension keys do not need their own table, you demonstrate practical judgment. If you try to create a dimension table for order_number, the interviewer knows you are applying the pattern mechanically.

Three Types of Fact Tables

Knowing all three types distinguishes you from candidates who only think of fact tables as transaction logs. Each type serves a different analytical purpose.

Transaction Fact

One row per discrete event. Each row captures a single occurrence: a purchase, a click, a shipment. This is the most common fact table type. It grows continuously and can become very large. There is no upper bound on the number of rows because new events keep arriving.

Example

An order_line_items fact with one row per product per order. Measures include quantity, unit_price, discount_amount, and line_total.

When to use

When you need to capture individual events with full detail. This is the default fact table type for most business processes.

Periodic Snapshot Fact

One row per entity per time period. Captures the state of something at regular intervals. The table has a predetermined number of rows based on the number of entities multiplied by the number of time periods. Rows are added predictably on schedule.

Example

A monthly_account_balance fact with one row per account per month. Measures include end_of_month_balance, deposits_this_month, withdrawals_this_month.

When to use

When you need to track cumulative or point-in-time metrics that do not have discrete events. Balances, inventory levels, and enrollment counts are classic periodic snapshot use cases.

Accumulating Snapshot Fact

One row per process instance, updated as milestones are reached. Unlike transaction facts, these rows are updated after initial insert. The row tracks the progress of a multi-step process from start to completion. Useful for workflows with defined stages.

Example

An order_fulfillment fact with one row per order. Columns include order_date, payment_date, ship_date, delivery_date. Each date is NULL until that milestone is reached, then filled in. Measures include days_to_ship and days_to_deliver.

When to use

When you need to analyze throughput, bottlenecks, and stage durations in a process. Manufacturing pipelines, loan origination, and order fulfillment are common use cases.

How to Design a Dimensional Model in an Interview

Follow this four-step process in every design interview. It gives you a structured approach that interviewers recognize as Kimball-aligned, and it prevents the most common mistake: jumping straight to column names without establishing the grain.

Step 1: Choose the Business Process

Start with a business process, not a department or a report. A business process generates events: orders, shipments, support tickets, ad impressions. Each process becomes its own fact table. Do not try to model the entire business in one fact table.

Interview advice

In an interview, the business process is usually given to you ('design a model for our e-commerce platform'). Clarify which specific process: are we modeling orders, returns, sessions, or all three? Each one is a separate fact table with its own grain.

Step 2: Declare the Grain

State what one row in the fact table represents. 'One row per order line item.' 'One row per ad impression.' 'One row per daily active user.' The grain determines everything else: which dimensions apply, which measures are possible, and how large the table will be.

Interview advice

Say the grain out loud before doing anything else. 'Before I start listing columns, let me define the grain: one row represents a single line item in a customer order.' This is the highest-signal move in a dimensional modeling interview. If you get the grain wrong, everything downstream is wrong too.

Step 3: Identify the Dimensions

List every descriptive context that applies at the declared grain. For an order line item, the dimensions are: date (when), customer (who), product (what), store or channel (where), promotion (why the discount). Each dimension becomes a foreign key in the fact table.

Interview advice

Use the journalistic questions (who, what, where, when, why, how) as a checklist. Walk through each one and identify the relevant dimension. If you miss a dimension, the interviewer may prompt you. Having a systematic approach shows maturity.

Step 4: Identify the Facts (Measures)

List the numeric, measurable values that occur at the declared grain. For an order line item: quantity, unit_price, discount_amount, line_total, tax_amount. Classify each as additive, semi-additive, or non-additive. This classification determines how the measure can be aggregated in queries.

Interview advice

Avoid putting derived measures in the fact table. Store quantity and unit_price; let the query compute line_total as quantity * unit_price. Exception: when the derivation involves business logic that should be applied once (like tax calculation), store the result. The interviewer may challenge you on whether a measure should be stored or computed.

Interview Questions with Guidance

These questions appear repeatedly in data modeling interview rounds. For each one, the guidance shows how a strong candidate structures their answer.

Design a dimensional model for an e-commerce platform. What is the grain of your fact table?

Start with the grain: 'one row per order line item.' Then list dimensions: date, customer, product, store/channel, promotion. The fact table measures are quantity, unit_price, discount_amount, line_total, shipping_cost_allocated. Stating the grain first is the most important thing you can do. Then walk through the dimensions systematically and justify each measure's classification (additive vs non-additive).

What is the difference between a fact table and a dimension table?

Fact tables store measurements (numeric values you aggregate). Dimension tables store descriptive context (text, categories, hierarchies you filter and group by). Facts answer 'how much' or 'how many.' Dimensions answer 'who, what, where, when.' Facts are tall and narrow, growing with every event. Dimensions are short and wide, changing slowly. In a star schema, the fact table sits at the center with foreign keys pointing outward to dimensions.

When would you use a snowflake schema instead of a star schema?

Snowflaking normalizes dimension tables to eliminate redundancy. A product dimension might split into product and category tables. The trade-off: storage savings vs query complexity (more joins). In modern cloud warehouses where storage is cheap and columnar compression handles redundancy efficiently, star schemas are almost always preferred because they are simpler to query and BI tools are optimized for them. Mention snowflaking as an option, but explain why you would default to star.

Explain the bus matrix and why it matters.

The bus matrix is a grid with business processes (fact tables) as rows and dimensions as columns. An X indicates a shared dimension. Its purpose is dimensional consistency across the enterprise. Without it, each team builds their own version of 'customer' or 'product' with different definitions. The bus matrix forces a single, conformed definition. Sketch one on the whiteboard: orders fact uses date, customer, product. Returns fact uses date, customer, product. Customer and product are conformed because they appear in both.

A product can belong to multiple categories. How do you model this in a star schema?

Three options: (1) Bridge table between the fact and a category dimension, enabling many-to-many relationships. This is the most flexible but adds a join. (2) Pick the primary category and store it in the product dimension. Simpler, but loses secondary categories. (3) Multi-valued dimension with a group key (one row per unique combination of categories). The right choice depends on the business requirement: does the end user need to filter by any category, or just the primary one? Ask the interviewer to clarify before committing to an approach.

What are the three types of fact tables? When would you use each?

Transaction (one row per event, grows continuously, no upper bound). Periodic snapshot (one row per entity per period, captures state at regular intervals, predictable growth). Accumulating snapshot (one row per process instance, updated as milestones are reached, tracks workflow progress). Most candidates only know transaction facts. Naming all three with concrete examples (orders, monthly balance, order fulfillment) demonstrates depth.

You are designing a warehouse for a SaaS company. What fact tables would you create?

Start with the business processes: user signups (transaction fact), monthly subscription snapshots (periodic snapshot), feature usage events (transaction fact), support tickets (transaction fact or accumulating snapshot if you track resolution stages). Then identify conformed dimensions that cross-cut: user, account/company, plan, date. The bus matrix for this scenario shows how the user dimension is shared across all fact tables, ensuring consistent user metrics across product, finance, and support analytics.

Common Mistakes in Interviews

These are the errors interviewers see most often. Avoiding them puts you ahead of the majority of candidates.

Designing for the report instead of the business process

A common interview anti-pattern is to look at a specific dashboard mockup and design a table that matches it exactly. This creates a table that serves one report and is useless for every other question. Dimensional modeling starts with the business process (orders, shipments, clicks), not the output format. The correct design supports any report about that process.

Mixing grains in a single fact table

If some rows represent individual transactions and others represent daily summaries, the table is broken. You cannot sum the measures without double-counting. Every row must represent the same thing at the same level of detail. If you need both transaction-level and daily-level data, create two separate fact tables.

Putting descriptive attributes in the fact table

Storing customer_name, product_category, or region directly in the fact table instead of joining to a dimension table. This creates data redundancy, update anomalies, and inconsistent descriptions. The only text values in a fact table should be degenerate dimensions (order_number) or status flags.

Forgetting to classify measures as additive, semi-additive, or non-additive

If you store unit_price in a fact table and a downstream report sums it across 1,000 line items, the result is meaningless. Knowing that unit_price is non-additive (it should be averaged or weighted, not summed) prevents this. Interviewers test this classification because it reveals whether you understand how measures will be consumed.

Dimensional Modeling vs Alternatives

Interviewers may ask how Kimball compares to other methodologies. The key is knowing which layer of the warehouse each approach targets.

vs 3NF (Third Normal Form)

3NF minimizes redundancy through normalization. It is optimized for transactional workloads (OLTP) where updates must be fast and consistent. Dimensional models are deliberately denormalized, optimized for analytical workloads (OLAP) where reads are far more frequent than writes. In an interview, explain that 3NF is for the source system; dimensional modeling is for the warehouse.

vs Data Vault

Data Vault (hubs, links, satellites) is designed for the integration layer of a warehouse. It excels when you have many source systems with conflicting definitions and need full auditability. Dimensional modeling is designed for the presentation layer. Many enterprises use Data Vault for integration and then build dimensional models (star schemas) on top for analytics consumption.

vs One Big Table (OBT)

OBT pre-joins everything into a single wide table. It trades storage and update complexity for query simplicity. OBT works for specific, well-defined use cases (a single dashboard, a feature store). It does not scale to a warehouse with dozens of business processes and hundreds of dimensions. Dimensional modeling provides structure that OBT does not.

Frequently Asked Questions

What is dimensional modeling?+
Dimensional modeling is a data warehouse design technique developed by Ralph Kimball. It organizes data into fact tables (measurable events) and dimension tables (descriptive context), connected in a star schema. The goal is to make data intuitive for business users, performant for analytical queries, and extensible as new business questions arise. It remains the most widely used methodology for analytics-focused data warehouses.
Is Kimball still relevant in 2026?+
Yes. While newer approaches like Data Vault and the medallion architecture exist, Kimball dimensional modeling remains the dominant methodology for the analytics layer. Most BI tools (Looker, Tableau, Power BI) are optimized for star schemas. Cloud warehouses (Snowflake, BigQuery, Redshift) perform best with denormalized, wide tables. In interviews, Kimball concepts are tested more frequently than any alternative approach.
What is the difference between Kimball and Inmon?+
Kimball builds the warehouse as a collection of star schemas organized by business process, delivering value incrementally (bottom-up). Inmon builds a normalized enterprise data warehouse first, then creates dimensional data marts from it (top-down). Kimball delivers faster time-to-value. Inmon provides more architectural consistency. Most modern teams use Kimball or a hybrid approach where the staging/integration layer is normalized but the presentation layer is dimensional.
Do I need to memorize the bus matrix for interviews?+
You do not need to memorize a specific bus matrix. But you should be able to explain what it is, why it exists (dimensional consistency), and sketch one on a whiteboard if asked. The ability to map business processes to shared dimensions is a practical skill that interviewers value. Practice by listing the fact tables and dimensions for a business you know well.
What is a star schema vs a snowflake schema?+
A star schema has dimension tables directly connected to the fact table, with no normalization of the dimensions. A snowflake schema normalizes dimension tables into sub-tables (e.g., product links to category links to department). Star schemas are simpler to query and faster in most warehouses. Snowflake schemas save storage by eliminating dimension redundancy. In practice and in interviews, star schemas are the default recommendation.

Build Data Modeling Fluency

DataDriven covers dimensional modeling, SQL, and system design with hands-on challenges at interview difficulty. Practice the concepts interviewers test, not textbook exercises.

Start Practicing