Data modeling interview prep for data engineer roles. Start with grain selection (one row per X), then dim_customer with Type 2 SCD, then conformed dimensions across multiple facts, then defend star over snowflake. Whiteboard practice problems with rubric-scored verdicts for the 55 percent of data engineer loops that include a modeling round.

Data modeling interview prep for data engineer roles centers on five skills. First, grain identification: pick the right unit of analysis (one row per order line item, one row per trip, one row per impression) and state it before drawing. Second, dimension design: choose Type 1 versus Type 2 versus Type 3 SCD per attribute, justify in one sentence per dimension. Third, conformed dimensions: build dim_customer with one schema shared across the orders fact, the returns fact, the support ticket fact. Fourth, fact additivity: identify additive (revenue, quantity), semi-additive (account balance, inventory), and non-additive (ratio, percentage) measures because they affect what aggregations are valid. Fifth, the bridge table for many-to-many.

Drill order for a data engineer with four weeks until an onsite. Weeks 1-2: design 6 star schemas on real domains (e-commerce, marketplace, rideshare, payments ledger, ad tech, content platform). For each, state the grain in one sentence, draw the fact table with FKs and measures, add 3-4 dimensions with SCD type per attribute, defend star versus snowflake versus OBT. Week 3: drill SCD merge logic. The Type 2 merge requires identifying changed rows (anti-join from staging to current dim), expiring the matched current rows (set valid_to to now, is_current to false), and inserting the new rows (valid_from now, is_current true). Practice this in SQL and in pandas. Week 4: drill the failure modes interviewers test. Mixed-grain fact tables, missing conformed dimension, the bridge table forgotten on a many-to-many, late-arriving dimension without a placeholder strategy.

Modeling rubrics in 2026 data engineer interviews score on five dimensions. SLA match: does the proposed model support the analytical workload's freshness requirement (real-time, micro-batch, daily, weekly). Grain correctness: is the fact grain one row per X stated explicitly and consistent across measures. Dimension design: are the SCD types correct, are conformed dimensions used where appropriate. Trade-off articulation: does the candidate defend star versus snowflake versus OBT in the specific domain. Adapt-on-fly: when the interviewer changes a requirement (real-time freshness instead of daily, multi-region instead of single-region), does the candidate modify the existing model in place or throw it out and restart.

Companies whose data modeling rounds appear most often in interview reports: Stripe (financial-data SCD2 merchant dimension, idempotent aggregations), Netflix (Iceberg medallion with streaming feeding gold), Meta (ads attribution with multi-touch SCD2 advertiser), Amazon (Redshift DISTKEY/SORTKEY-aware star schemas), Airbnb (search-funnel star with SCD2 listing dimension), Uber (trip-grain fact with H3 location). The data engineer who has practiced the modeling round across three of these domains usually clears the modeling round at any of them.

Data Modeling Interview Prep

Prep guide for the data modeling round of a data engineer interview loop.

57 practice problems matching this filter. Difficulty: medium (32), easy (8), hard (17).

Data Modeling (57)

Common questions

What percentage of data engineer interviews include a data modeling round?
Roughly 55 percent of data engineer loops include an explicit data modeling whiteboard round. The share rises with seniority: nearly all L5+ data engineer loops include modeling, and analytics engineer loops include it at 80 percent. The format is usually 45 minutes on a whiteboard or canvas with a real-world domain (marketplace, rideshare, payments, ad tech).
How should a data engineer prep for the modeling round in 4 weeks?
Weeks 1-2: design 6 star schemas on real domains, state grain first, defend star vs snowflake vs OBT. Week 3: drill SCD Type 2 merge logic in SQL and pandas. Week 4: drill failure modes (mixed grain, missing conformed dimension, forgotten bridge table, late-arriving dim). Two timed mocks with someone in the final week.
What is the most common failure mode in the modeling round?
Starting to draw before stating the grain. The candidate who jumps to dim_customer and dim_product before saying 'one row per order line item' almost always builds a fact table that mixes grains and then has to throw it out. Fix is mechanical: say the grain out loud before drawing.
Do interviewers expect specific warehouse vendor knowledge?
Depends on the company. Amazon expects Redshift DISTKEY and SORTKEY decisions. Snowflake-and-Databricks expect QUALIFY and MERGE INTO patterns. Google expects BigQuery partitioning and clustering. Most other companies stay vendor-neutral in the modeling round and test pure Kimball-style design. Mention vendor-specific optimizations when relevant; do not force them where the question is neutral.
What trade-offs does a senior data engineer modeling round expect?
Star versus snowflake versus OBT in the specific domain. SCD Type 2 versus Type 1 per attribute. Conformed dimensions across facts versus per-fact denormalization. Bridge table versus pre-aggregation. Real-time freshness versus daily batch trade-off in the design that surrounds the model. The L5 signal is naming two alternatives and defending the choice; the L4 signal is producing a working model.
How does grain selection affect the rest of the model?
The grain determines which dimensions are needed (only the dimensions that vary at the chosen grain), which measures are additive (sum to higher levels) versus semi-additive (account balance: sums over entities, averages over time) versus non-additive (ratios, percentages, distinct counts), and where SCDs come into play (only dimensions where the grain's attributes change over time). Wrong grain at the start propagates errors through every subsequent decision.
What is a conformed dimension and why does it matter?
A conformed dimension is a dimension table with one schema and identity used across multiple fact tables. dim_customer with the same columns, surrogate keys, and SCD semantics shared by the orders fact, the returns fact, and the support ticket fact. Without conformed dimensions, analysts cannot join across facts without explicit translation. Senior data engineer rubrics weight this; junior rubrics rarely test it.