Data Modeling

Star Schema vs Snowflake Schema

Star schema is the most commonly tested dimensional modeling pattern in data engineering interviews, with medallion architecture close behind. Data modeling overall appears in roughly a third of DE interview loops.

You need to explain both schemas, draw both, and defend your choice. This page covers the trade-offs interviewers actually test.

What Each Schema Looks Like

Star Schema

fact_orders (order_id, customer_id, product_id, date_id, amount, quantity)
dim_customer (customer_id, name, city, state, country, segment)
dim_product (product_id, name, category, subcategory, brand, price)
dim_date (date_id, date, month, quarter, year, is_weekend)

Each dimension is one flat table. "city", "state", and "country" are all columns in dim_customer.

Snowflake Schema

fact_orders (order_id, customer_id, product_id, date_id, amount, quantity)
dim_customer (customer_id, name, city_id, segment)
dim_city (city_id, city_name, state_id)
dim_state (state_id, state_name, country_id)
dim_country (country_id, country_name)
dim_product (product_id, name, subcategory_id, brand, price)
dim_subcategory (subcategory_id, subcategory_name, category_id)
dim_category (category_id, category_name)

Geography is split into city, state, and country tables. Product category is its own table. More JOINs, less redundancy.

Side-by-Side Comparison

Normalization

Star

Denormalized dimension tables. Each dimension is a single flat table with redundant data.

Snowflake

Normalized dimension tables. Dimensions are split into sub-tables linked by foreign keys.

Query Complexity

Star

Simpler queries. Fewer JOINs because dimensions are flat. Analysts can write queries without knowing the full schema.

Snowflake

More JOINs required. Querying a single dimension may require joining 2-3 tables together.

Storage

Star

Uses more storage due to repeated values in dimension tables. In practice, dimension tables are small relative to fact tables, so the difference is minor.

Snowflake

Uses less storage. Redundancy is eliminated through normalization. Matters more when dimension tables are very large.

Maintenance

Star

Easier to maintain for analytics workloads. Updates to dimension attributes require changing one table.

Snowflake

More complex to maintain. Updates may propagate across multiple normalized tables. Better for enforcing data integrity.

When to Use

Star

Analytics, reporting, dashboards. When query speed and simplicity matter more than storage savings. This is the default choice for most data warehouses.

Snowflake

When dimension tables are very large, when storage cost matters, or when you need strict referential integrity. More common in traditional enterprise environments.

How to Answer Schema Design Questions in Interviews

Data modeling appears in roughly a third of DE interviews. Interviewers don't want you to recite definitions. They want you to make a choice and defend it with trade-offs specific to the scenario.

Start with star unless told otherwise

In an interview, default to a star schema. It is simpler to explain, faster to query, and easier to draw on a whiteboard. Only switch to snowflake if the interviewer gives you a reason (large dimensions, storage constraints, integrity requirements).

Name the trade-off explicitly

Don't just say "I'd use a star schema." Say: "I'd use a star schema because the dimension tables are small and query simplicity matters more than storage here. If the customer dimension grew to hundreds of millions of rows, I'd consider normalizing it."

Know how to draw both on a whiteboard

Practice sketching a fact table with 3-4 dimensions in star form, then normalizing one dimension into snowflake form. The interviewer wants to see that you understand what normalization actually changes.

Connect it to real workloads

Mention that modern columnar warehouses (which compress repeated values well) reduce the storage advantage of snowflake schemas. This shows you think about the actual execution environment, not just textbook rules.

Frequently Asked Questions

Which is better: star schema or snowflake schema?+
Neither is universally better. Star schemas are simpler to query and perform well for analytics workloads. Snowflake schemas save storage and enforce stricter integrity. For most modern data warehouses, star schemas are the default choice because columnar storage reduces the storage penalty of denormalization.
Can you use both in the same data warehouse?+
Yes. Many warehouses mix both patterns. You might keep most dimensions denormalized (star) but normalize a very large dimension like a product catalog with deep hierarchies (snowflake). The schema should match the query patterns, not a single rule.
Do interviewers expect me to know both?+
Yes. Data modeling appears in roughly a third of DE interviews, and star schema is the single most tested dimensional modeling pattern. You should be able to explain both patterns, draw them, and articulate when you would choose one over the other. Simply naming them is not enough.
How does this relate to normalization (1NF, 2NF, 3NF)?+
A snowflake schema applies normalization principles (typically 3NF) to dimension tables. A star schema intentionally denormalizes dimensions for query performance. Both patterns assume the fact table is already normalized. Understanding normal forms helps you explain why snowflake dimensions have fewer redundant columns.

Build Schemas, Not Just Read About Them

DataDriven has interactive schema design challenges where you build and modify star and snowflake schemas. Practice the skill, not just the theory.