Question 1

Why does a data engineer pick star schema over snowflake schema?

Accepted Answer

On a 2026 columnar warehouse (Snowflake, BigQuery, Redshift, Databricks Delta), the optimizer broadcasts small dimension tables and the join cost is negligible. Snowflake's storage savings from normalization rarely outweigh the query complexity and join cost. Star wins unless a dimension is genuinely too large to broadcast (millions of rows with high-cardinality attributes), in which case snowflake that one dim, not the whole model.

Question 2

What is a conformed dimension?

Accepted Answer

A conformed dimension is a dimension table with one schema and one set of surrogate keys, used by multiple fact tables. dim_customer with the same columns and identity in the orders fact, the returns fact, and the support ticket fact. The benefit is that analysts can join across facts without explicit translation. Senior data engineer modeling rubrics explicitly weight conformed dimensions; junior rubrics often skip the question.

Question 3

When does one-big-table (OBT) beat a star schema?

Accepted Answer

Rarely, in 2026. OBT can win for a single analytical workload at a small company where query simplicity outweighs storage and update cost. Star wins everywhere there are multiple facts, multiple analytical workloads, or any need for conformed dimensions. OBT also breaks when a dimension attribute changes (changing a customer name requires updating every fact row). Mention OBT, defend why star is the better choice in the specific domain.

Question 4

What is an additive measure?

Accepted Answer

A measure that can be summed across all dimensions. Revenue is additive (sums across customer, product, date, region). Quantity is additive. Cost is additive. Semi-additive measures (account balance, inventory level) can be summed across some dimensions but not others (you sum balances across customers but you do not sum a single customer's balance across dates; you take the latest). Non-additive measures (ratios, percentages, distinct counts) must be computed from raw counts at the desired aggregation level.

Question 5

Should I use a surrogate key or natural key on dimensions?

Accepted Answer

Surrogate key when the natural key is unstable (customer emails change), when SCD Type 2 is needed (the natural key recurs across versions), or when the natural key is composite. Natural key when stable, simple, human-readable for joins, and SCD Type 1 is sufficient. dim_date almost always uses an int surrogate (20260527) that is both surrogate and natural; dim_customer almost always uses a separate surrogate from customer_id.

Question 6

How does star schema handle a many-to-many relationship?

Accepted Answer

Through a bridge table. If a product can belong to multiple categories, do not join orders directly to a category dim (causes Cartesian explosion). Instead, build a product_category bridge with two FKs (product_id, category_id) and optionally a weighting factor. Join orders to dim_product, then to product_category bridge, then to dim_category. SUM across the bridge requires explicit deduplication or pre-aggregation.

Question 7

What is the grain of a star schema fact table?

Accepted Answer

The grain is the unit of analysis: one row per X. For an orders star, the grain is usually one row per order line item (not one row per order, which loses line-item detail; not one row per customer, which loses individual orders). State the grain in one sentence before drawing the fact table. Mixed-grain fact tables (some rows at order-level, others at line-item-level) are the failure mode interviewers fish for.

Star Schema Interview Questions

Star Schema Interview Questions

Data Modeling (63)