Data Modeling Mock Interview

200+ data modeling questions across star schema, SCDs, data vault, and medallion architecture. The AI interviewer asks follow-up questions based on your specific answers, creating a discussion round that mirrors what you will face at top tech companies.

200+

Modeling Questions

AI Follow-ups

Discussion Mode

45+

Scenarios Covered

35 min

Avg. Round Length

Why Data Modeling Is the Hardest Round to Prepare For

Data modeling interviews are fundamentally different from coding interviews. There is no code editor, no test cases, no pass/fail. The interviewer gives you a business scenario and says: 'Design the data model.' Then they spend 35 minutes probing your design with follow-up questions that get harder as the conversation progresses.

The difficulty comes from three sources. First, there are multiple valid designs for any scenario, so you cannot study 'the answer.' You have to study the reasoning. Second, the follow-up questions are unpredictable. They depend on your specific design choices, so memorizing answers does not work. Third, you need to explain your thinking verbally, in real time, which is a separate skill from knowing the right answer internally.

Most candidates prepare by reading about star schemas and SCDs. That is necessary but not sufficient. Reading teaches you the vocabulary. Interview performance requires you to apply that vocabulary to a novel scenario while defending your choices against probing questions. The gap between 'I know what a fact table is' and 'I can design a fact table for a food delivery app and explain why the grain is one row per order, not per item' is where most candidates fail.

The Self-Study Problem for Data Modeling

There is no compiler to check your answer

SQL and Python have a clear feedback loop: your code either runs correctly or it does not. Data modeling has no equivalent. You draw a schema, and there is no automated way to know if it is good. Is the grain correct? Are the dimensions conformed? Will this schema support the query patterns the business needs? Without feedback, you can practice for weeks and reinforce bad habits. DataDriven's AI interviewer evaluates your modeling decisions against the criteria real interviewers use and tells you what is missing.

Follow-up questions reveal depth (or lack of it)

A data modeling interview is not a single question with a single answer. It is a 35-minute conversation. You propose a design, and the interviewer probes it. 'What if the business adds a new product type?' 'How would you handle late-arriving dimension data?' 'What is the grain of that fact table?' These follow-ups test whether you actually understand your design or just drew something that looked reasonable. DataDriven's AI interviewer generates follow-up questions based on your specific answers, creating a conversation that feels like a real round.

Books teach theory, not interview performance

You can read 'The Data Warehouse Toolkit' cover to cover and still stumble in a modeling interview. The book teaches star schema design principles. The interview tests whether you can apply those principles to a novel scenario under time pressure while articulating your reasoning out loud. Those are different skills. The only way to build the second skill is to practice it: propose a design, get questioned on it, revise it, and explain why you changed it.

Modeling decisions depend on context that changes

The 'right' model for an e-commerce platform depends on query patterns, data volume, team size, update frequency, and compliance requirements. A model that is perfect for a 10-person startup is wrong for a 10,000-person enterprise. Interviewers test whether you ask about context before proposing a design. Candidates who draw a schema without asking questions about the business get lower scores. DataDriven's AI interviewer rewards you for asking clarifying questions before committing to a design.

What Data Modeling Topics Interviewers Test

Star Schema Design (30%)

Star schema is the foundation of analytical data modeling. Interviewers give you a business scenario and ask you to design the fact and dimension tables. They want to hear you think about grain (what does one row represent?), conformed dimensions (can this dimension join to multiple facts?), and degenerate dimensions (order number lives on the fact table, not in its own dimension). Key questions: Design a star schema for an e-commerce platform. What is the grain of your fact table?; The product team wants to analyze user behavior across web and mobile. How do you model the dimension tables?; Your fact table has 2 billion rows. How does the star schema help query performance?; A stakeholder asks for a report that joins three fact tables. What are the risks?. What interviewers listen for: Interviewers listen for whether you start with the grain. Candidates who jump straight to drawing tables without defining what one row represents almost always make mistakes later. They also check whether you understand the tradeoff between normalization (less storage, more joins) and denormalization (more storage, faster queries). There is no single right answer. The quality of your reasoning matters more than the specific schema you propose.

Slowly Changing Dimensions (SCDs) (20%)

SCDs test whether you understand how real-world data changes over time. A customer moves to a new city. A product changes categories. An employee gets promoted. How do you model these changes so that historical reports stay accurate? There are 6 SCD types, but interviewers focus on Types 1, 2, and 3. Key questions: A customer changes their address. How do you update the dimension table while preserving historical order data?; Explain the tradeoffs between SCD Type 1 (overwrite), Type 2 (new row), and Type 3 (new column).; Your SCD Type 2 dimension table has 50 million rows, 80% of which are historical. How do you manage query performance?; A regulatory audit requires knowing a customer's address at the time of each transaction. Which SCD type do you use?. What interviewers listen for: The SCD Type 2 implementation details separate mid-level from senior candidates. Mid-level candidates know the concept. Senior candidates talk about effective_date and expiration_date columns, the is_current flag, surrogate keys vs natural keys, and what happens when a dimension change arrives out of order. They also discuss the ETL complexity: the merge logic for Type 2 is significantly harder than Type 1, and the interviewer wants to know you have actually built it.

Normalization and Denormalization (20%)

Normalization reduces redundancy. Denormalization improves read performance. Every modeling decision sits somewhere on this spectrum, and interviewers test whether you can reason about the tradeoff for a specific scenario. OLTP systems lean normalized. OLAP systems lean denormalized. The interesting questions live in the middle. Key questions: Your source system is in 3NF. Your analytics warehouse uses star schema. Walk me through the transformation.; A table stores customer address as a single text field. Should you normalize it into street, city, state, zip?; Your denormalized fact table is 4TB. Storage costs are rising. Where do you renormalize?; A data scientist needs a wide, flat table for ML features. How do you denormalize without creating maintenance nightmares?. What interviewers listen for: Strong candidates do not say 'always normalize' or 'always denormalize.' They ask about the use case. Who queries this table? How often does it change? What is the query pattern? A table queried by analysts 100 times a day with complex joins benefits from denormalization. A table updated 10,000 times a second needs normalization to avoid update anomalies. The interviewer is checking that you match the modeling approach to the workload.

Data Vault (15%)

Data vault is a modeling methodology designed for auditability, historization, and agility. It uses three entity types: hubs (business keys), links (relationships), and satellites (descriptive attributes with history). Interviewers test data vault less frequently than star schema, but it appears at companies that deal with regulatory compliance, frequent source system changes, or complex data integration. Key questions: Explain the difference between a hub, a link, and a satellite in data vault 2.0.; Why would a company choose data vault over star schema for their raw/staging layer?; How does data vault handle source system changes (new columns, renamed fields)?; Walk me through loading data from 3 source systems into a data vault model.. What interviewers listen for: Data vault questions reveal whether you have worked in environments with complex data integration. If your experience is single-source-system analytics, you will struggle to explain why data vault exists. Interviewers look for you to articulate the problem it solves (multiple sources, changing schemas, audit requirements) before explaining the solution (hubs, links, satellites). If you have never used data vault in production, be honest about that, but show you understand when and why it is the right choice.

Medallion Architecture (15%)

Bronze, silver, gold. Raw, cleaned, aggregated. The medallion pattern organizes a lakehouse into layers of increasing data quality. It is popular at companies using Databricks, Delta Lake, or similar platforms. Interviewers test whether you understand the purpose of each layer and can make decisions about where transformations belong. Key questions: Walk me through the bronze, silver, and gold layers for a streaming e-commerce pipeline.; A data quality issue is found in the gold layer. How do you trace it back to the source?; Should business logic live in the silver layer or the gold layer? Defend your answer.; How do you handle schema evolution across medallion layers when the source adds a new column?. What interviewers listen for: The medallion architecture is straightforward in concept but subtle in practice. Interviewers probe the boundary between silver and gold: where does cleaning end and business logic begin? Strong candidates have opinions based on experience. They say things like 'We put deduplication in silver because it is source-level cleanup, but we put revenue attribution in gold because it is a business rule that changes.' Weak candidates recite the three-layer definition without engaging with the tradeoffs.

How the AI Discussion Mode Works

01
You receive a business scenario
The AI interviewer describes a real-world business context. 'You are building the data warehouse for a food delivery app. The business wants to track orders, deliveries, restaurants, drivers, and customer ratings.' The scenario is specific enough to constrain your design but open enough to allow multiple valid approaches.
02
You ask clarifying questions
Before proposing a design, you ask about query patterns, data volume, update frequency, and stakeholder needs. The AI interviewer answers with realistic details: 'The analytics team runs daily cohort analysis. The operations team needs real-time driver utilization. Data volume is 2 million orders per day.' Your questions and the answers shape the optimal design.
03
You propose your design
You describe your fact and dimension tables, explain the grain, and justify your normalization decisions. You can type or use bullet points. The AI interviewer reads your design and generates follow-up questions based on the specific choices you made.
04
The AI probes your reasoning
Follow-up questions get progressively harder. 'Why did you make driver a separate dimension instead of a degenerate dimension?' 'How do you handle a delivery that spans midnight?' 'What SCD type do you use for restaurant ratings?' Each question targets a gap or ambiguity in your design. You revise and explain.
05
You get scored on multiple dimensions
After the round, the AI scores you on: correctness of grain and schema structure, depth of tradeoff reasoning, quality of clarifying questions, handling of follow-up probes, and communication clarity. Each dimension gets specific feedback, not just a score.

4 Mistakes That Sink Modeling Interviews

Jumping to a design without defining the grain

The grain (what one row of the fact table represents) is the single most important decision in a star schema. If you get the grain wrong, everything else falls apart. Interviewers at Amazon and Meta specifically check whether candidates state the grain before drawing tables. If you do not, they will ask, and your credibility drops. Always start with: 'The grain of this fact table is one row per [event] per [entity] per [time period].'

Over-normalizing an analytical model

Candidates with a strong OLTP background instinctively normalize everything to 3NF. But an analytical data warehouse optimized for SELECT queries, not INSERT/UPDATE, benefits from denormalization. Joining 8 normalized tables for every dashboard query kills performance and frustrates analysts. State the tradeoff explicitly: 'In an OLTP system I would normalize this, but since our analytics warehouse prioritizes read performance, I am denormalizing the customer dimension to include city, state, and region directly.'

Ignoring slowly changing dimensions

Every dimension changes over time. Customers move. Products get recategorized. Employees get promoted. Candidates who do not address SCD strategy leave a gap that interviewers will probe. 'What happens when a customer changes their address? Does your historical reporting break?' Proactively mention your SCD strategy for each dimension.

Designing for today's requirements only

The interviewer will ask: 'What if the business launches in a new country? What if they add a subscription model?' Your model should accommodate foreseeable changes without a full redesign. Use conformed dimensions that can serve multiple fact tables. Design your schema so adding a new fact table does not require changing the dimension tables. Mention this flexibility in your explanation.

What a Strong Modeling Answer Looks Like

The interviewer says: 'Design the data model for a ride-sharing app.' Here is what a strong candidate does in the first 5 minutes, before drawing any tables.

They ask clarifying questions. 'What are the primary analytical use cases? Driver utilization? Rider retention? Pricing optimization?' The interviewer says rider retention and pricing. The candidate now knows the model needs to support cohort analysis and price elasticity queries.

They define the grain. 'I will start with a ride fact table where one row represents one completed ride. The grain is ride_id. This supports both rider retention analysis (rides per rider over time) and pricing analysis (fare per ride with distance and surge multiplier).'

They list the dimensions. 'Rider dimension, driver dimension, time dimension (date plus hour), pickup location dimension, dropoff location dimension, and vehicle type dimension. I am using a location dimension instead of raw lat/long because the analytics team needs to group by neighborhood and city.' They explain the SCD strategy: 'Rider and driver dimensions use Type 2 because a rider's home city and a driver's vehicle might change, and historical reports need to reflect the values at the time of the ride.'

They address a tradeoff unprompted. 'I considered making surge_multiplier a dimension, but since it is a continuous numeric value that is unique to each ride, it belongs on the fact table as a degenerate dimension. Creating a surge dimension would just be a lookup table with thousands of rows for each 0.01 increment, which adds complexity without analytical value.'

This answer takes 5 minutes. It demonstrates understanding of grain, conformed dimensions, SCD strategy, and denormalization tradeoffs. The remaining 30 minutes of the interview are spent on follow-up questions that go deeper into each decision. DataDriven's AI interviewer simulates exactly this conversation flow.

Prepare for the interview

01 / Open invite

02min.

Know the patterns before the interviewer asks them.

a data modeling query, the same shape a screen would give you.

The diff against expected. Where ties broke. What you missed.

sandbox

1fact_orders

2 order_id bigint PK

3 customer_sk bigint FK

4 order_date date SCD2

Execute your solution0.4s avg.

PinterestInterview question

Solve a problem

Frequently Asked Questions

How does the AI interviewer work for data modeling rounds?+

The AI presents a business scenario and lets you ask clarifying questions before proposing a design. After you submit your design, it generates follow-up questions based on your specific choices, not a scripted path. It probes gaps in your reasoning, asks about tradeoffs, and tests how your model handles edge cases. The conversation is iterative, just like a real modeling interview.

Is data modeling the hardest round in a DE interview?+

For most candidates, yes. SQL and Python rounds have clear right/wrong answers. Data modeling rounds have multiple valid approaches, and the score depends on the quality of your reasoning, not just your final schema. The discussion format also requires verbal articulation skills that coding rounds do not test.

What data modeling topics should I study first?+

Start with star schema fundamentals: fact tables, dimension tables, grain, conformed dimensions, and degenerate dimensions. Then study SCD Types 1, 2, and 3. After that, learn about data vault basics and medallion architecture. Star schema and SCDs together cover about 50% of modeling interview questions.

How many data modeling questions does DataDriven have?+

Over 200 questions across 45+ business scenarios. Topics include star schema design (30%), slowly changing dimensions (20%), normalization and denormalization (20%), data vault (15%), and medallion architecture (15%). Each scenario includes multiple follow-up question branches based on your answers.

Can I practice data modeling without the discussion mode?+

Yes. DataDriven also has standalone modeling questions that present a scenario and ask you to describe your design. But the discussion mode is closer to what you will experience in a real interview, so we recommend using it for at least half your practice sessions.

02 / Why practice

Practice Modeling Conversations, Not Just Diagrams

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
The round is won on tradeoffs, not on the diagram
Grain, star vs snowflake, SCD type, conformed dimensions, late-arriving data. Modeling under live pushback is what separates the bands, and it is the half almost nobody rehearses

Start Modeling Interview

Related Interview Guides

Data Engineer Mock Interview→

The complete guide to mock interviews across all 5 data engineering domains.

Pipeline Architecture Mock Interview→

Practice system design rounds: batch vs streaming, idempotency, and failure handling.

Data Modeling Interview Questions→

Standalone data modeling questions with worked solutions and common mistakes.