Data Modeling
Star schema is the most commonly tested dimensional modeling pattern in data engineering interviews, with medallion architecture close behind. Data modeling overall appears in roughly a third of DE interview loops.
You need to explain both schemas, draw both, and defend your choice. This page covers the trade-offs interviewers actually test.
Each dimension is one flat table. "city", "state", and "country" are all columns in dim_customer.
Geography is split into city, state, and country tables. Product category is its own table. More JOINs, less redundancy.
Denormalized dimension tables. Each dimension is a single flat table with redundant data.
Normalized dimension tables. Dimensions are split into sub-tables linked by foreign keys.
Simpler queries. Fewer JOINs because dimensions are flat. Analysts can write queries without knowing the full schema.
More JOINs required. Querying a single dimension may require joining 2-3 tables together.
Uses more storage due to repeated values in dimension tables. In practice, dimension tables are small relative to fact tables, so the difference is minor.
Uses less storage. Redundancy is eliminated through normalization. Matters more when dimension tables are very large.
Easier to maintain for analytics workloads. Updates to dimension attributes require changing one table.
More complex to maintain. Updates may propagate across multiple normalized tables. Better for enforcing data integrity.
Analytics, reporting, dashboards. When query speed and simplicity matter more than storage savings. This is the default choice for most data warehouses.
When dimension tables are very large, when storage cost matters, or when you need strict referential integrity. More common in traditional enterprise environments.
Data modeling appears in roughly a third of DE interviews. Interviewers don't want you to recite definitions. They want you to make a choice and defend it with trade-offs specific to the scenario.
In an interview, default to a star schema. It is simpler to explain, faster to query, and easier to draw on a whiteboard. Only switch to snowflake if the interviewer gives you a reason (large dimensions, storage constraints, integrity requirements).
Don't just say "I'd use a star schema." Say: "I'd use a star schema because the dimension tables are small and query simplicity matters more than storage here. If the customer dimension grew to hundreds of millions of rows, I'd consider normalizing it."
Practice sketching a fact table with 3-4 dimensions in star form, then normalizing one dimension into snowflake form. The interviewer wants to see that you understand what normalization actually changes.
Mention that modern columnar warehouses (which compress repeated values well) reduce the storage advantage of snowflake schemas. This shows you think about the actual execution environment, not just textbook rules.
DataDriven has interactive schema design challenges where you build and modify star and snowflake schemas. Practice the skill, not just the theory.