Data Modeling

Snowflake Schema vs Star Schema

Most candidates treat "star vs snowflake" as a balanced trade-off question. It isn't. In a columnar warehouse in 2026 there is a default answer and a narrow exception, and the interviewer is watching for how quickly you say so. The candidates who hedge lose the round. The candidates who rattle off storage-cost arguments from a 2003 textbook lose faster.

The real question hiding underneath is whether you've actually written queries against both and measured anything. Almost nobody has.

67%

Modeling Qs Default Star

49

Onsite Modeling Rounds

191

Modeling Qs Analyzed

275

Companies in Dataset

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

Side-by-Side Comparison

Eight dimensions where the two patterns actually differ. Read carefully: every "advantage" listed for snowflake used to be true on a row-store database in 2003 and is mostly disinformation on a columnar engine today. Interviewers are listening for whether you know the difference.

Normalization

Star Schema

Denormalized. Each dimension is a single flat table. A product dimension contains category, subcategory, brand, and manufacturer all in one row. Data is duplicated across rows.

Snowflake Schema

Normalized to 2NF or 3NF. Dimensions are split into multiple related tables. Product links to category, which links to subcategory. No redundant data.

Number of tables

Star Schema

Fewer tables. A typical star schema has 1 fact table and 4 to 8 dimension tables. Queries involve 2 to 5 JOINs.

Snowflake Schema

More tables. The same schema might have 15 to 30 tables because each dimension hierarchy is decomposed. Queries involve 5 to 15 JOINs.

Query complexity

Star Schema

Simpler queries. Because dimensions are flat, you JOIN the fact table directly to each dimension with a single key. Analysts and BI tools can write queries without understanding the hierarchy structure.

Snowflake Schema

More complex queries. You often need multi-hop JOINs to traverse dimension hierarchies. A query that needs the category name for a product requires joining product to subcategory to category.

Query performance

Star Schema

Generally faster for analytical queries. Fewer JOINs means less work for the query optimizer. On columnar warehouses (Snowflake, BigQuery, Redshift), the denormalized approach aligns well with the storage model.

Snowflake Schema

Generally slower for analytical queries due to more JOINs. However, the performance gap narrows on modern columnar engines with optimized JOIN algorithms. For very large dimensions with high cardinality, normalized tables can be faster because they're smaller.

Storage efficiency

Star Schema

Higher storage usage. Repeating 'Electronics > Computers > Laptops' across millions of fact rows (via the dimension table) wastes space compared to storing that hierarchy once in normalized form.

Snowflake Schema

Lower storage usage. Each piece of data is stored once. For large dimensions with long text fields, the savings can be meaningful. On modern cloud warehouses, storage is cheap enough that this rarely matters.

Data integrity

Star Schema

Update anomalies are possible. If a category name changes, you update it in one dimension table row, but the denormalized structure means the change is simple (one table). However, if the same data appears in multiple dimension tables, inconsistencies can creep in.

Snowflake Schema

Stronger data integrity. Each piece of data exists in one place. Updates are atomic. Foreign key constraints enforce referential integrity. This matters most in environments where dimension data changes frequently.

ETL complexity

Star Schema

Simpler to load. You build each dimension table independently and load the fact table with foreign keys. The denormalized structure means fewer transformation steps.

Snowflake Schema

More complex to load. You need to load tables in dependency order (category before subcategory before product). Foreign key constraints add validation steps. More transformation logic in the pipeline.

BI tool compatibility

Star Schema

Excellent. BI tools like Looker, Tableau, Power BI, and Metabase are designed for star schemas. Auto-generated SQL assumes flat dimensions. Star schemas 'just work' with most BI tools.

Snowflake Schema

Requires more configuration. BI tools can work with snowflake schemas, but you often need to define explicit relationships or create views that flatten the hierarchy back into a star-like structure.

When to Use Each

Use Star Schema When

*Your primary consumers are analysts and BI tools, not other data engineers
*Query speed matters more than storage efficiency (this is almost always true on modern cloud warehouses)
*Dimension tables are small to medium-sized (under 10 million rows)
*You want self-service analytics where non-engineers can write their own queries
*You're building a data warehouse or data mart for reporting and dashboards
*Your team values simplicity and maintainability over strict normalization

Use Snowflake Schema When

*Dimension hierarchies are deep and change frequently (e.g., a product catalog with 6 levels of categorization that's updated daily)
*Storage costs are a real constraint (rare with cloud warehouses, more relevant with on-premise systems)
*Data integrity is critical and you need foreign key enforcement at the database level
*You're building an operational data store (ODS) that serves both transactional and analytical workloads
*Your dimensions have very high cardinality with long text fields that would bloat if denormalized
*Regulatory requirements demand that certain reference data is stored exactly once for audit purposes

Schema Structure Example

An e-commerce data warehouse modeled both ways.

Star Schema

fact_orders
  order_id
  customer_key  --> dim_customer
  product_key   --> dim_product
  date_key      --> dim_date
  store_key     --> dim_store
  quantity
  revenue

dim_product (flat)
  product_key
  product_name
  category_name      -- duplicated
  subcategory_name   -- duplicated
  brand_name         -- duplicated
  manufacturer_name  -- duplicated

Snowflake Schema

fact_orders
  order_id
  customer_key  --> dim_customer
  product_key   --> dim_product
  date_key      --> dim_date
  store_key     --> dim_store
  quantity
  revenue

dim_product (normalized)
  product_key
  product_name
  subcategory_key --> dim_subcategory

dim_subcategory
  subcategory_key
  subcategory_name
  category_key --> dim_category

dim_category
  category_key
  category_name

Notice how the star schema has one dim_product table with all hierarchy levels flattened in. The snowflake schema splits the product hierarchy into 3 tables. To get the category name for a product in the snowflake schema, you need to JOIN product to subcategory to category.

How to Answer This in Interviews

This question appears in nearly every DE system design round. Here's how to frame your answer.

The 'correct' answer in interviews

In most DE interviews, the expected answer is: 'I'd use a star schema for the analytical layer because it's simpler to query, performs better for aggregations, and works well with BI tools. I'd consider a snowflake schema if the dimension hierarchies are deep and change frequently, or if storage constraints require normalization.' This shows you understand both models and can make a pragmatic choice. Don't dogmatically say one is always better.

How interviewers test this

The question usually comes in system design rounds: 'Design the data model for [use case].' You choose star or snowflake based on the requirements. The interviewer probes your reasoning: 'Why not normalize those dimensions?' or 'What happens when the category hierarchy changes?' Having specific reasons for your choice (query patterns, consumer types, update frequency) is what separates a strong answer from a textbook recitation.

The hybrid view that impresses

In practice, most modern data warehouses use a hybrid approach. The core fact and dimension tables follow a star schema for simplicity, but some dimensions (like geographic hierarchies or product catalogs) are partially normalized where it makes sense. Mentioning this hybrid reality and explaining when you'd normalize vs. denormalize shows production experience. An answer like 'I'd start with a star schema and only normalize dimensions where the update frequency or hierarchy depth justifies the added complexity' is stronger than picking one model absolutely.

Common follow-up questions

Be prepared for: 'How would you handle slowly changing dimensions in this model?' (SCD types 1, 2, and 3), 'How would you partition the fact table?' (by date, usually), 'What if a new dimension needs to be added?' (add a new dimension table and a foreign key column to the fact table). These questions test whether you've actually built data models in production, not just studied the theory.

Frequently Asked Questions

Which is better: star schema or snowflake schema?+
For analytical workloads (data warehouses, BI dashboards, reporting), a star schema is better in the vast majority of cases. It's simpler to query, faster for aggregations, and works natively with BI tools. A snowflake schema makes sense when dimension hierarchies are deep, change frequently, and storage or data integrity constraints require normalization. On modern cloud warehouses (Snowflake the product, BigQuery, Redshift), storage is cheap and query engines handle denormalized data efficiently, which tilts the decision further toward star schemas.
Does the Snowflake data warehouse use snowflake schema?+
No. Snowflake (the data warehouse product) doesn't require or default to snowflake schema (the data modeling pattern). The naming similarity is confusing but unrelated. You can build star schemas, snowflake schemas, or any other modeling pattern in Snowflake the product. In practice, most Snowflake users build star schemas for their analytical layer because it aligns with how the columnar engine processes queries.
How do star and snowflake schemas affect query performance?+
Star schemas generally perform better for analytical queries because they require fewer JOINs. Each additional JOIN adds overhead: the query optimizer needs to determine join order, allocate memory for hash tables, and process intermediate results. On columnar warehouses, the performance gap can be significant for complex queries with many dimension lookups. However, for very large dimensions (hundreds of millions of rows with wide text columns), normalized snowflake dimensions can be faster because the intermediate tables are smaller and fit in memory more easily.
What is a galaxy schema?+
A galaxy schema (also called a fact constellation) is a model with multiple fact tables that share some dimension tables. For example, an e-commerce warehouse might have a sales_fact and a returns_fact, both linked to the same customer, product, and date dimensions. This is common in real production warehouses where business processes share dimensions. It's not a separate schema type so much as a natural extension of the star schema pattern to multiple business processes.

Pick a Side in 15 Seconds, Defend It in 60

Practice the schema prompts on DataDriven until you can commit to a default without hedging and name the one edge case that changes the answer. That's the whole interview.

Start Practicing