Data Modeling Practice for Data Engineering Interviews

You do not answer multiple choice. You draw the schema. Tables, columns, foreign keys, SCD type, the works. The validator checks your grain first, then the structure, then the relationships, and surfaces what is off in plain language.

Draw the Schema. The Canvas Scores It.

You do not answer multiple choice. You draw the schema. Tables, columns, foreign keys, SCD type, the works. The validator checks your grain first, then the structure, then the relationships, and surfaces what is off in plain language.

How the Modeling Canvas Works

A canvas that scores structure

Drag in tables, attach columns, draw foreign keys. The validator checks your table set, relationships, normalization level, and key choices against the reference solution. Multiple correct solutions are accepted because real modeling rarely has one right answer.

Grain checks first, everything else second

Almost every wrong fact table starts with a misstated grain. The validator surfaces the grain it inferred from your design and tells you whether it matches the requirement. If it does not, the rest of the feedback is held back so you fix the foundation first.

Difficulty that escalates by structure, not size

Early problems are a single fact and three dimensions. Later problems add bridge tables, junk dimensions, multiple grains, SCD2 with effective-dating bugs, and conformed dimensions across two facts. Difficulty is not about adding more columns.

Filter by company and level

A modeling round at Amazon for L5 has a different shape than the same round at a 200-person Series C. The filters scope the bank to the patterns and depths those loops actually hit.

Structural feedback, not just pass or fail

When you submit, the validator names what is off: missing dimension, wrong cardinality on a join, an SCD column that should have been on the dim, a fact with mixed grain. Useful for actually learning rather than retrying blind.

Readiness by topic

Star, snowflake, SCD types, junk dimensions, bridges, data vault, conformed dimensions. Each tracks separately so you can see which patterns you have actually internalized versus which ones you can recognize but not draw.

Data Modeling Topics

Star Schema (Medium)

Frequency: Very High (3,700/mo searches) | Count: Core

Snowflake Schema (Medium)

Frequency: High (1,400/mo searches) | Count: Multiple

Star Schema vs Snowflake Schema (Medium)

Frequency: High (800/mo searches) | Count: Comparison

Dimensional Modeling (Medium-Hard)

Frequency: High (600/mo searches) | Count: Core

Slowly Changing Dimensions (Medium-Hard)

Frequency: High | Count: Types 1-3

Data Vault Modeling (Hard)

Frequency: Medium (500/mo searches) | Count: Multiple

Grain Definition (Medium-Hard)

Frequency: Very High | Count: Every problem

Fact Table Types (Medium)

Frequency: Medium-High | Count: 3 types

Conformed Dimensions (Hard)

Frequency: Medium | Count: Cross-domain

Two Modes, Used for Different Parts of Prep

Problem mode

Clear requirements, no timer. Build the schema on the canvas and submit when you are satisfied. The validator returns structural feedback in seconds. Best when you are learning a new pattern, like the first time you build an SCD2 dim or a junk dimension.

Interview mode

A deliberately under-specified scenario, a timer, and an AI interviewer that pushes on trade-offs as you draw. Mid-design they may add a requirement that breaks your grain, the way a real interviewer would. Verdict at the end with the specific design choices that decided it.

Data Modeling Practice FAQ

What is a star schema?+
A central fact table joined to a ring of denormalized dimension tables. The fact table holds measurable events: orders, clicks, page views. The dimensions describe the context around those events: customer, product, date, store. It became the dominant warehouse model because it is cheap to query and easy for non-engineers to reason about, and that is still why interviewers default to it as the asked-for design.
How does a snowflake schema differ from a star?+
A snowflake takes the dimensions of a star and normalizes them further, so a single dim becomes a chain of related tables. The win is storage and update consistency. The cost is more joins on every analytical query, which hurts on engines that are not great at small joins. In practice, most warehouses denormalize back into star and pay the storage cost.
What is dimensional modeling?+
The Ralph Kimball school: split the world into facts (things that happened) and dimensions (the context around them), then design for query speed and human comprehension instead of normalization purity. Almost every modern analytical warehouse builds gold-layer tables this way, which is why nearly every modeling interview leans on Kimball vocabulary.
What are slowly changing dimensions?+
How you track a dimension attribute that changes over time. Type 1 overwrites the value, losing history. Type 2 starts a new row with effective_from / effective_to / is_current columns, so historical facts still join to the value that was correct at the time. Type 3 adds a previous-value column to the current row. SCD2 is the one almost every interviewer wants to see you build, because the merge logic is non-trivial.
Is this practice free?+
Yes. There is no subscription and there is not going to be one. The site does not sell the practice product.
02 / Why practice

Open the Modeling Canvas

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Related Data Modeling Resources