Dimensional modeling interview questions for data engineer roles. Kimball-style design across e-commerce, marketplace, payments, ad tech, and content platform domains. Grain selection, conformed dimensions across multiple facts, SCD Type 1 vs Type 2 vs Type 3, additive measures, bridge tables for many-to-many.
Dimensional modeling for data engineer interviews follows the Kimball methodology: identify the business process, declare the grain of the fact table, identify the dimensions that apply at that grain, identify the facts (measures). The Kimball four-step process is the framework most data engineer modeling rounds implicitly expect, even when the interviewer does not name it.
Step one: identify the business process. Sales, payments, ad impressions, content views, ride trips, support interactions. The business process is the source of the events that flow into the fact table. Conformed dimensions are designed to span multiple business processes (a single dim_customer used across the sales fact, the returns fact, and the support fact). Step two: declare the grain of the fact table. One row per X, where X is the atomic unit of the business process. For sales, that is usually one row per order line item, not one row per order. For trips, one row per trip. For ad impressions, one row per impression. Mixed grains in one fact table is the failure mode. Step three: identify the dimensions that apply at that grain. Each dimension is a context: who (customer, user), what (product, content), where (store, location, geography), when (date, time), how (channel, payment method). Some dimensions are degenerate (an order number with no attributes beyond the natural key), some are role-playing (the same dim_date used as order_date, ship_date, return_date with three FKs in the fact). Step four: identify the facts. Numeric measures the business cares about: revenue, quantity, duration, cost. Mark each as additive, semi-additive, or non-additive.
Dimensional modeling interview questions for data engineer roles test six recurring patterns. Conformed dimensions: same dim_customer in multiple facts, with one schema and one set of surrogate keys. Role-playing dimensions: dim_date referenced as order_date_key, ship_date_key, return_date_key in the same fact. Junk dimensions: low-cardinality flags (is_promotional, is_first_purchase, channel) consolidated into one dim_junk to avoid bloating the fact with multiple FKs. Degenerate dimensions: natural keys (order_number, invoice_number) stored on the fact without a separate dim table. Bridge tables: many-to-many product-category, patient-diagnosis, impression-conversion. SCD Type 2 mechanics: surrogate_key, natural_key, effective_from, effective_to, is_current; expire-and-insert pattern on change; fact joins on surrogate for point-in-time correctness.
Senior data engineer dimensional modeling rounds add platform concerns. Conformed dimensions across multiple data marts (a single dim_customer used by the sales mart, the marketing mart, and the support mart). Slowly changing fact tables for corrections (append-only with version column versus in-place with audit log). Late-arriving dimensions with placeholder rows. Hierarchies in dimensions (employee-to-manager, category-to-parent-category) modeled as parent-child columns versus a hierarchy bridge table. Multi-valued dimensions modeled as bridges with weighting factors for fractional attribution.
Dimensional Modeling Interview Questions
Kimball dimensional modeling problems for data engineer interview prep.
57 practice problems matching this filter. Difficulty: medium (32), easy (8), hard (17).
Data Modeling (57)
- A/B Experiment Assignment Schema - medium - One user, one experiment, one variant. No exceptions.
- Where They Used to Live - medium - They moved. The data stayed behind.
- Airline Flight Operations Schema - medium - Flights, passengers, and routes. Before you draw a single table, tell me the grain.
- A Number for the Seller - easy - They want a total. Give them the right schema first.
- B2B Invoicing Data Model - easy - Invoices go out, partial payments trickle in, and some customers are three months overdue.
- Clickstream and Session Schema - medium - Millions of clicks, mostly anonymous.
- Cloud File Storage Metadata Schema - hard - A file is also a folder. A folder is also a file.
- Content Engagement Data Model - hard - Post published. Now measure everything that happens next.
- Content Search and Discovery Schema - hard - Searchable from every angle. Design it so nothing gets lost.
- Customer Address History - easy - People move. Sometimes twice in a month. How do you remember where everyone was, and when?
Common questions
- What is the Kimball four-step dimensional modeling process?
- Step 1: identify the business process (sales, payments, ad impressions). Step 2: declare the grain of the fact table (one row per X). Step 3: identify the dimensions that apply at that grain (who, what, where, when, how). Step 4: identify the facts (numeric measures: revenue, quantity, duration). Most data engineer modeling rounds implicitly expect this framework even when not named.
- What is a role-playing dimension?
- A single dimension table referenced multiple times in the same fact with different roles. dim_date with FKs order_date_key, ship_date_key, return_date_key all pointing to the same dim_date table. Saves storage versus separate date dims. The query layer typically creates views (dim_order_date, dim_ship_date, dim_return_date) that alias the same underlying table.
- What is a junk dimension?
- A consolidated dimension for low-cardinality flags. Instead of separate FKs in the fact for is_promotional, is_first_purchase, channel, payment_type, build one dim_junk with all combinations of these flags as rows. One FK in the fact. Avoids bloating the fact with many low-cardinality FKs. Especially useful when the flags have few combinations relative to fact volume.
- What is a degenerate dimension?
- A natural key stored on the fact without a separate dim table. Order_number, invoice_number, transaction_id. There are no attributes beyond the key itself, so a dim table would be redundant. The key remains on the fact for traceability and ad-hoc joins back to the source system.
- How do conformed dimensions differ from regular dimensions?
- Conformed dimensions are designed to span multiple business processes. One dim_customer used by the sales fact, the returns fact, and the support fact, with identical schema and surrogate keys. Regular (non-conformed) dimensions live in only one fact. Senior data engineer rubrics weight conformed dimensions because they enable cross-fact analysis (a customer's lifetime value combining sales and returns) without explicit identifier translation.
- What is an additive fact and why does it matter?
- A fact measure that can be summed across all dimensions. Revenue is additive (sums across customer, product, date). Quantity is additive. Cost is additive. Semi-additive measures (account balance) sum across some dimensions but not time. Non-additive (ratios, percentages) must be computed at the desired aggregation level. Identifying additivity is part of the modeling rubric.
- How does dimensional modeling handle hierarchies?
- Two patterns. Parent-child columns: dim_employee with manager_id FK to itself (works for fixed-depth or limited-depth hierarchies). Hierarchy bridge: a separate bridge table that flattens all ancestor-descendant relationships (handles arbitrary depth, supports SQL queries for any level). Categories with parent-categories often use parent-child; org charts often use hierarchy bridge.
- When is a snowflake schema better than a star schema?
- When a specific dimension is genuinely too large to broadcast (millions of rows with high-cardinality attributes that the optimizer cannot fit in driver memory or shuffle efficiently). Snowflake that one dimension by normalizing its attribute group into a sub-dim. Do not snowflake the whole model; the join cost outweighs the storage saving for most dimensions in modern columnar warehouses.