Star schema interview questions isolated from the data modeling catalog. One row per X grain selection, conformed dimensions shared across multiple facts, additive versus semi-additive measures, and the trade-off arguments for star over snowflake or one-big-table in modern columnar warehouses.
Star schema is the default answer in most data engineer modeling rounds. The structure is one fact table surrounded by dimension tables, each dimension joined to the fact via a foreign key. The fact holds numeric measures (quantity, unit_price, total_amount, duration_seconds) plus FKs; the dimensions hold descriptive attributes (customer name, product category, store address, date attributes). In a columnar warehouse like Snowflake, BigQuery, Redshift, or Databricks Delta, the optimizer broadcasts the small dimensions and the join cost is negligible compared to the storage and clarity benefits.
Star schema interview questions for data engineer roles cluster into six sub-patterns. Grain selection: one row per order line item versus one row per order versus one row per customer-day. The grain choice determines what measures are additive and what dimensions are needed. Conformed dimensions: dim_customer used across the orders fact, the returns fact, the support ticket fact with one schema and one set of surrogate keys. The benefit is that analysts can join across facts without explicit translation; the cost is upfront design effort. Additive versus semi-additive versus non-additive measures: revenue is additive (sums across customer, product, date), account balance is semi-additive (sums across customer but averages across date), ratios and percentages are non-additive (must be computed from raw counts). Surrogate keys versus natural keys: surrogate when the natural key is unstable, when SCD Type 2 is needed, or when the natural key is composite. SCD type per dimension attribute: Type 1 for typo correction, Type 2 for history retention. The bridge table for many-to-many: a product-category bridge avoids the Cartesian explosion when joining the orders fact to a category dim.
Star schema versus snowflake schema is a recurring interview question. Snowflake normalizes dimensions into their own dimensional tables to reduce storage. The trade-off is more joins per query and harder analyst SQL. In a 2026 columnar warehouse, dimensions are usually small enough to broadcast in 5-50ms, so the join cost is negligible. Star wins unless one specific dimension is genuinely too large to broadcast (millions of rows with high-cardinality attributes), in which case the data engineer snowflakes that one dimension, not the whole model.
Star schema versus one-big-table (OBT) is the newer trade-off question in 2026 data engineer interviews. OBT flattens all dimensions into the fact table, producing a wide denormalized table. The benefit is zero joins at query time and simple SQL. The cost is data redundancy (one customer's name repeated on every order row, increasing storage 10x or more), no conformed dimensions across facts (each fact-OBT has its own embedded customer attributes), and slow ingestion (changing a customer name requires updating every fact row). OBT can win for a single analytical workload at a small company where simplicity outweighs cost; star wins everywhere else.
Star Schema Interview Questions
Star schema interview problems for data engineer interview prep.
57 practice problems matching this filter. Difficulty: medium (32), easy (8), hard (17).
Data Modeling (57)
- A/B Experiment Assignment Schema - medium - One user, one experiment, one variant. No exceptions.
- Where They Used to Live - medium - They moved. The data stayed behind.
- Airline Flight Operations Schema - medium - Flights, passengers, and routes. Before you draw a single table, tell me the grain.
- A Number for the Seller - easy - They want a total. Give them the right schema first.
- B2B Invoicing Data Model - easy - Invoices go out, partial payments trickle in, and some customers are three months overdue.
- Clickstream and Session Schema - medium - Millions of clicks, mostly anonymous.
- Cloud File Storage Metadata Schema - hard - A file is also a folder. A folder is also a file.
- Content Engagement Data Model - hard - Post published. Now measure everything that happens next.
- Content Search and Discovery Schema - hard - Searchable from every angle. Design it so nothing gets lost.
- Customer Address History - easy - People move. Sometimes twice in a month. How do you remember where everyone was, and when?
Common questions
- Why does a data engineer pick star schema over snowflake schema?
- On a 2026 columnar warehouse (Snowflake, BigQuery, Redshift, Databricks Delta), the optimizer broadcasts small dimension tables and the join cost is negligible. Snowflake's storage savings from normalization rarely outweigh the query complexity and join cost. Star wins unless a dimension is genuinely too large to broadcast (millions of rows with high-cardinality attributes), in which case snowflake that one dim, not the whole model.
- What is a conformed dimension?
- A conformed dimension is a dimension table with one schema and one set of surrogate keys, used by multiple fact tables. dim_customer with the same columns and identity in the orders fact, the returns fact, and the support ticket fact. The benefit is that analysts can join across facts without explicit translation. Senior data engineer modeling rubrics explicitly weight conformed dimensions; junior rubrics often skip the question.
- When does one-big-table (OBT) beat a star schema?
- Rarely, in 2026. OBT can win for a single analytical workload at a small company where query simplicity outweighs storage and update cost. Star wins everywhere there are multiple facts, multiple analytical workloads, or any need for conformed dimensions. OBT also breaks when a dimension attribute changes (changing a customer name requires updating every fact row). Mention OBT, defend why star is the better choice in the specific domain.
- What is an additive measure?
- A measure that can be summed across all dimensions. Revenue is additive (sums across customer, product, date, region). Quantity is additive. Cost is additive. Semi-additive measures (account balance, inventory level) can be summed across some dimensions but not others (you sum balances across customers but you do not sum a single customer's balance across dates; you take the latest). Non-additive measures (ratios, percentages, distinct counts) must be computed from raw counts at the desired aggregation level.
- Should I use a surrogate key or natural key on dimensions?
- Surrogate key when the natural key is unstable (customer emails change), when SCD Type 2 is needed (the natural key recurs across versions), or when the natural key is composite. Natural key when stable, simple, human-readable for joins, and SCD Type 1 is sufficient. dim_date almost always uses an int surrogate (20260527) that is both surrogate and natural; dim_customer almost always uses a separate surrogate from customer_id.
- How does star schema handle a many-to-many relationship?
- Through a bridge table. If a product can belong to multiple categories, do not join orders directly to a category dim (causes Cartesian explosion). Instead, build a product_category bridge with two FKs (product_id, category_id) and optionally a weighting factor. Join orders to dim_product, then to product_category bridge, then to dim_category. SUM across the bridge requires explicit deduplication or pre-aggregation.
- What is the grain of a star schema fact table?
- The grain is the unit of analysis: one row per X. For an orders star, the grain is usually one row per order line item (not one row per order, which loses line-item detail; not one row per customer, which loses individual orders). State the grain in one sentence before drawing the fact table. Mixed-grain fact tables (some rows at order-level, others at line-item-level) are the failure mode interviewers fish for.