200+ data modeling questions across star schema, SCDs, data vault, and medallion architecture. The AI interviewer asks follow-up questions based on your specific answers, creating a discussion round that mirrors what you'll face at top tech companies.
Data modeling interviews are fundamentally different from coding interviews. There's no code editor, no test cases, no pass/fail. The interviewer gives you a business scenario and says: "Design the data model." Then they spend 35 minutes probing your design with follow-up questions that get harder as the conversation progresses.
The difficulty comes from three sources. First, there are multiple valid designs for any scenario, so you can't study "the answer." You have to study the reasoning. Second, the follow-up questions are unpredictable. They depend on your specific design choices, so memorizing answers doesn't work. Third, you need to explain your thinking verbally, in real time, which is a separate skill from knowing the right answer internally.
Most candidates prepare by reading about star schemas and SCDs. That's necessary but not sufficient. Reading teaches you the vocabulary. Interview performance requires you to apply that vocabulary to a novel scenario while defending your choices against probing questions. The gap between "I know what a fact table is" and "I can design a fact table for a food delivery app and explain why the grain is one row per order, not per item" is where most candidates fail.
SQL and Python have a clear feedback loop: your code either runs correctly or it doesn't. Data modeling has no equivalent. You draw a schema, and there's no automated way to know if it's good. Is the grain correct? Are the dimensions conformed? Will this schema support the query patterns the business needs? Without feedback, you can practice for weeks and reinforce bad habits. DataDriven's AI interviewer evaluates your modeling decisions against the criteria real interviewers use and tells you what's missing.
A data modeling interview isn't a single question with a single answer. It's a 35-minute conversation. You propose a design, and the interviewer probes it. 'What if the business adds a new product type?' 'How would you handle late-arriving dimension data?' 'What's the grain of that fact table?' These follow-ups test whether you actually understand your design or just drew something that looked reasonable. DataDriven's AI interviewer generates follow-up questions based on your specific answers, creating a conversation that feels like a real round.
You can read 'The Data Warehouse Toolkit' cover to cover and still stumble in a modeling interview. The book teaches star schema design principles. The interview tests whether you can apply those principles to a novel scenario under time pressure while articulating your reasoning out loud. Those are different skills. The only way to build the second skill is to practice it: propose a design, get questioned on it, revise it, and explain why you changed it.
The 'right' model for an e-commerce platform depends on query patterns, data volume, team size, update frequency, and compliance requirements. A model that's perfect for a 10-person startup is wrong for a 10,000-person enterprise. Interviewers test whether you ask about context before proposing a design. Candidates who draw a schema without asking questions about the business get lower scores. DataDriven's AI interviewer rewards you for asking clarifying questions before committing to a design.
Star schema and SCDs together account for 50% of modeling interview questions. If you're short on prep time, master those two before moving to data vault and medallion architecture.
Star schema is the foundation of analytical data modeling. Interviewers give you a business scenario and ask you to design the fact and dimension tables. They want to hear you think about grain (what does one row represent?), conformed dimensions (can this dimension join to multiple facts?), and degenerate dimensions (order number lives on the fact table, not in its own dimension).
Interviewers listen for whether you start with the grain. Candidates who jump straight to drawing tables without defining what one row represents almost always make mistakes later. They also check whether you understand the tradeoff between normalization (less storage, more joins) and denormalization (more storage, faster queries). There's no single right answer. The quality of your reasoning matters more than the specific schema you propose.
SCDs test whether you understand how real-world data changes over time. A customer moves to a new city. A product changes categories. An employee gets promoted. How do you model these changes so that historical reports stay accurate? There are 6 SCD types, but interviewers focus on Types 1, 2, and 3.
The SCD Type 2 implementation details separate mid-level from senior candidates. Mid-level candidates know the concept. Senior candidates talk about effective_date and expiration_date columns, the is_current flag, surrogate keys vs natural keys, and what happens when a dimension change arrives out of order. They also discuss the ETL complexity: the merge logic for Type 2 is significantly harder than Type 1, and the interviewer wants to know you've actually built it.
Normalization reduces redundancy. Denormalization improves read performance. Every modeling decision sits somewhere on this spectrum, and interviewers test whether you can reason about the tradeoff for a specific scenario. OLTP systems lean normalized. OLAP systems lean denormalized. The interesting questions live in the middle.
Strong candidates don't say 'always normalize' or 'always denormalize.' They ask about the use case. Who queries this table? How often does it change? What's the query pattern? A table queried by analysts 100 times a day with complex joins benefits from denormalization. A table updated 10,000 times a second needs normalization to avoid update anomalies. The interviewer is checking that you match the modeling approach to the workload.
Data vault is a modeling methodology designed for auditability, historization, and agility. It uses three entity types: hubs (business keys), links (relationships), and satellites (descriptive attributes with history). Interviewers test data vault less frequently than star schema, but it appears at companies that deal with regulatory compliance, frequent source system changes, or complex data integration.
Data vault questions reveal whether you've worked in environments with complex data integration. If your experience is single-source-system analytics, you'll struggle to explain why data vault exists. Interviewers look for you to articulate the problem it solves (multiple sources, changing schemas, audit requirements) before explaining the solution (hubs, links, satellites). If you've never used data vault in production, be honest about that, but show you understand when and why it's the right choice.
Bronze, silver, gold. Raw, cleaned, aggregated. The medallion pattern organizes a lakehouse into layers of increasing data quality. It's popular at companies using Databricks, Delta Lake, or similar platforms. Interviewers test whether you understand the purpose of each layer and can make decisions about where transformations belong.
The medallion architecture is straightforward in concept but subtle in practice. Interviewers probe the boundary between silver and gold: where does cleaning end and business logic begin? Strong candidates have opinions based on experience. They say things like 'We put deduplication in silver because it's source-level cleanup, but we put revenue attribution in gold because it's a business rule that changes.' Weak candidates recite the three-layer definition without engaging with the tradeoffs.
The AI interviewer describes a real-world business context. 'You're building the data warehouse for a food delivery app. The business wants to track orders, deliveries, restaurants, drivers, and customer ratings.' The scenario is specific enough to constrain your design but open enough to allow multiple valid approaches.
Before proposing a design, you ask about query patterns, data volume, update frequency, and stakeholder needs. The AI interviewer answers with realistic details: 'The analytics team runs daily cohort analysis. The operations team needs real-time driver utilization. Data volume is 2 million orders per day.' Your questions and the answers shape the optimal design.
You describe your fact and dimension tables, explain the grain, and justify your normalization decisions. You can type or use bullet points. The AI interviewer reads your design and generates follow-up questions based on the specific choices you made.
Follow-up questions get progressively harder. 'Why did you make driver a separate dimension instead of a degenerate dimension?' 'How do you handle a delivery that spans midnight?' 'What SCD type do you use for restaurant ratings?' Each question targets a gap or ambiguity in your design. You revise and explain.
After the round, the AI scores you on: correctness of grain and schema structure, depth of tradeoff reasoning, quality of clarifying questions, handling of follow-up probes, and communication clarity. Each dimension gets specific feedback, not just a score.
The grain (what one row of the fact table represents) is the single most important decision in a star schema. If you get the grain wrong, everything else falls apart. Interviewers at Amazon and Meta specifically check whether candidates state the grain before drawing tables. If you don't, they'll ask, and your credibility drops.
Always start with: 'The grain of this fact table is one row per [event] per [entity] per [time period].' For the food delivery example: 'One row per order.' Not per delivery (an order might have multiple deliveries) and not per item (that's a different grain, a line-item fact table).
Candidates with a strong OLTP background instinctively normalize everything to 3NF. But an analytical data warehouse optimized for SELECT queries, not INSERT/UPDATE, benefits from denormalization. Joining 8 normalized tables for every dashboard query kills performance and frustrates analysts.
State the tradeoff explicitly: 'In an OLTP system I'd normalize this, but since our analytics warehouse prioritizes read performance, I'm denormalizing the customer dimension to include city, state, and region directly.' This shows the interviewer you know both approaches and can choose based on context.
Every dimension changes over time. Customers move. Products get recategorized. Employees get promoted. Candidates who don't address SCD strategy leave a gap that interviewers will probe. 'What happens when a customer changes their address? Does your historical reporting break?'
Proactively mention your SCD strategy for each dimension. 'For customer address, I'd use Type 2 because we need to report revenue by the customer's location at the time of the order. For customer name, Type 1 is fine since name changes don't affect analytical reports.'
The interviewer will ask: 'What if the business launches in a new country? What if they add a subscription model?' Your model should accommodate foreseeable changes without a full redesign. Candidates who build rigid schemas that break on the first business change score lower.
Use conformed dimensions that can serve multiple fact tables. Design your schema so adding a new fact table (subscriptions, promotions) doesn't require changing the dimension tables. Mention this flexibility in your explanation.
The interviewer says: "Design the data model for a ride-sharing app." Here's what a strong candidate does in the first 5 minutes, before drawing any tables.
They ask clarifying questions. "What are the primary analytical use cases? Driver utilization? Rider retention? Pricing optimization?" The interviewer says rider retention and pricing. The candidate now knows the model needs to support cohort analysis and price elasticity queries.
They define the grain. "I'll start with a ride fact table where one row represents one completed ride. The grain is ride_id. This supports both rider retention analysis (rides per rider over time) and pricing analysis (fare per ride with distance and surge multiplier)."
They list the dimensions. "Rider dimension, driver dimension, time dimension (date plus hour), pickup location dimension, dropoff location dimension, and vehicle type dimension. I'm using a location dimension instead of raw lat/long because the analytics team needs to group by neighborhood and city." They explain the SCD strategy: "Rider and driver dimensions use Type 2 because a rider's home city and a driver's vehicle might change, and historical reports need to reflect the values at the time of the ride."
They address a tradeoff unprompted. "I considered making surge_multiplier a dimension, but since it's a continuous numeric value that's unique to each ride, it belongs on the fact table as a degenerate dimension. Creating a surge dimension would just be a lookup table with thousands of rows for each 0.01 increment, which adds complexity without analytical value."
This answer takes 5 minutes. It demonstrates understanding of grain, conformed dimensions, SCD strategy, and denormalization tradeoffs. The remaining 30 minutes of the interview are spent on follow-up questions that go deeper into each decision. DataDriven's AI interviewer simulates exactly this conversation flow.
The AI presents a business scenario and lets you ask clarifying questions before proposing a design. After you submit your design, it generates follow-up questions based on your specific choices, not a scripted path. It probes gaps in your reasoning, asks about tradeoffs, and tests how your model handles edge cases. The conversation is iterative, just like a real modeling interview.
For most candidates, yes. SQL and Python rounds have clear right/wrong answers. Data modeling rounds have multiple valid approaches, and the score depends on the quality of your reasoning, not just your final schema. The discussion format also requires verbal articulation skills that coding rounds don't test.
Start with star schema fundamentals: fact tables, dimension tables, grain, conformed dimensions, and degenerate dimensions. Then study SCD Types 1, 2, and 3. After that, learn about data vault basics and medallion architecture. Star schema and SCDs together cover about 50% of modeling interview questions.
Over 200 questions across 45+ business scenarios. Topics include star schema design (30%), slowly changing dimensions (20%), normalization and denormalization (20%), data vault (15%), and medallion architecture (15%). Each scenario includes multiple follow-up question branches based on your answers.
Yes. DataDriven also has standalone modeling questions that present a scenario and ask you to describe your design. But the discussion mode is closer to what you'll experience in a real interview, so we recommend using it for at least half your practice sessions.
The AI interviewer asks follow-ups based on your answers. It's the closest thing to a real modeling round without hiring a coach.
The complete guide to mock interviews across all 5 data engineering domains.
Practice system design rounds: batch vs streaming, idempotency, and failure handling.
Standalone data modeling questions with worked solutions and common mistakes.