Star Schema vs Snowflake Schema (2026)
Star schema is the most commonly tested dimensional modeling pattern in DE interviews, with medallion architecture close behind. Data modeling overall appears in roughly a third of DE loops.
What this guide actually says
Star is the default in 2026 for analytics workloads. Snowflake makes sense in a narrow set of cases: deep, frequently-changing hierarchies; storage-constrained on-prem; strict referential integrity. The interview question is rarely 'which one' — it's 'why this one for this scenario.'
Star schema example
fact_orders (order_id, customer_id, product_id, date_id, amount, quantity)
dim_customer (customer_id, name, city, state, country, segment)
dim_product (product_id, name, category, subcategory, brand, price)
dim_date (date_id, date, month, quarter, year, is_weekend)Each dimension is one flat table. 'city', 'state', and 'country' are all columns in dim_customer.
Snowflake schema example: same data, normalized hierarchies
fact_orders (order_id, customer_id, product_id, date_id, amount, quantity)
dim_customer (customer_id, name, city_id, segment)
dim_city (city_id, city_name, state_id)
dim_state (state_id, state_name, country_id)
dim_country (country_id, country_name)
dim_product (product_id, name, subcategory_id, brand, price)
dim_subcategory (subcategory_id, subcategory_name, category_id)
dim_category (category_id, category_name)Geography split into city, state, country tables. Product category is its own table. More JOINs, less redundancy.
Side-by-side comparison
| Dimension | Star Schema | Snowflake Schema |
|---|---|---|
| Normalization | Denormalized dim tables. Each dim is a single flat table with redundant data. | Normalized dim tables. Dims split into sub-tables linked by foreign keys. |
| Query Complexity | Simpler queries. Fewer JOINs because dims are flat. Analysts can write queries without knowing the full schema. | More JOINs required. Querying a single dim may require joining 2-3 tables. |
| Storage | More storage due to repeated values in dim tables. In practice, dim tables are small relative to fact tables, so the difference is minor. | Less storage. Redundancy eliminated through normalization. Matters more when dim tables are very large. |
| Maintenance | Easier for analytics workloads. Updates to dim attributes require changing one table. | More complex. Updates may propagate across multiple normalized tables. Better for enforcing data integrity. |
| When to use | Analytics, reporting, dashboards. When query speed and simplicity matter more than storage savings. The default for most warehouses. | When dim tables are very large, storage cost matters, or you need strict referential integrity. More common in traditional enterprise environments. |
How to answer schema design questions in interviews
Data modeling appears in ~30% of DE interviews. Interviewers don't want you to recite definitions; they want you to make a choice and defend it with trade-offs specific to the scenario.
Start with star unless told otherwise
In an interview, default to star. Simpler to explain, faster to query, easier to draw on a whiteboard. Only switch to snowflake if the interviewer gives you a reason (large dimensions, storage constraints, integrity requirements).
Name the trade-off explicitly
Don't just say 'I'd use a star schema.' Say: 'I'd use star because the dimension tables are small and query simplicity matters more than storage here. If the customer dimension grew to hundreds of millions of rows, I'd consider normalizing it.'
Know how to draw both on a whiteboard
Practice sketching a fact with 3-4 dimensions in star form, then normalizing one dimension into snowflake form. The interviewer wants to see that you understand what normalization actually changes.
Connect it to real workloads
Mention that modern columnar warehouses (which compress repeated values well) reduce the storage advantage of snowflake. Shows you think about the actual execution environment, not just textbook rules.
Frequently asked questions
Which is better: star schema or snowflake schema?+
Can you use both in the same data warehouse?+
Do interviewers expect me to know both?+
How does this relate to normalization (1NF, 2NF, 3NF)?+
Build schemas, not just read about them
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
The round is won on tradeoffs, not on the diagram
Grain, star vs snowflake, SCD type, conformed dimensions, late-arriving data. Modeling under live pushback is what separates the bands, and it is the half almost nobody rehearses