Snowflake Schema: Guide for Data Engineering Interviews

A snowflake schema normalizes the dimension tables of a star schema into hierarchies of related tables. It reduces storage and update complexity at the cost of more joins per query. Most modern warehouses prefer star schema because columnar engines handle denormalized dimensions better than deep join chains. This guide explains when each is the right answer.

What Is a Snowflake Schema

A snowflake schema extends the star schema by normalizing dimension tables into hierarchies of related tables. Where a star schema has one flat product dimension containing category and subcategory columns, a snowflake schema splits those into separate tables: dim_product, dim_category, and dim_subcategory, each joined to the next.

The name comes from the shape of the entity-relationship diagram. Instead of a clean star with one level of dimension tables, the snowflake branches outward in multiple layers.

Snowflake schemas reduce storage and make dimension updates cheaper. They add join complexity and slow down analytical queries. Most modern warehouses choose star because storage is cheap and query performance is expensive.

Prepare for the interview
01 / Open invite
02min.

Know Snowflake Schema the way the interviewer who asks it knows it.

a Snowflake Schema query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1fact_orders
2 order_id bigint PK
3 customer_sk bigint FK
4 order_date date SCD2
5
Execute your solution0.4s avg.
AirbnbInterview question
Solve a Snowflake Schema problem

Star Schema vs Snowflake Schema

DimensionStar SchemaSnowflake Schema
Dimension structureDenormalized (flat)Normalized (hierarchy of tables)
Join count per queryLow (1 join per dimension)High (multiple joins per dimension)
StorageMore (repeated values in dims)Less (values stored once)
Query performanceFaster on most enginesSlower due to join cost
Update complexityHigh (update many rows)Low (update one row)
Readability for analystsHighLower

When Snowflake Schema Makes Sense

Frequently updated dimension hierarchies

When a product category hierarchy changes often, updating one row in a normalized category table is cheaper than updating millions of denormalized product rows. Snowflake schema shines when write performance matters as much as read performance.

Very large dimension tables

If a customer dimension has 500 million rows and many of those customers share the same city, state, and country, storing geography once in a separate table instead of repeating it in every customer row can save significant storage.

Strict data governance requirements

Normalized dimensions have one authoritative copy of each value. If the legal name of a city changes, you update one row. In a denormalized star, you update millions. When audit trails and single-source-of-truth matter, normalization is easier to defend.

Snowflake Schema Interview Questions

How does a snowflake schema differ from a star schema?

A snowflake schema normalizes dimension tables into hierarchies. A product dimension in a star schema contains product name, category, and subcategory in one flat table. In a snowflake schema, category and subcategory live in separate tables joined back to product. This reduces storage and update complexity but adds join overhead at query time.

Why do most modern warehouses prefer star over snowflake?

Modern columnar warehouses like Snowflake, BigQuery, and Redshift are optimized for scanning large column ranges rather than resolving many joins. Denormalized dimensions produce fewer joins, which maps better to how these engines execute. Storage is cheap. Join overhead at petabyte scale is not.

Is a snowflake schema ever the right answer in an interview?

Yes, in specific scenarios: dimensions that change frequently, very large dimension tables with high cardinality shared attributes, or compliance environments where a single authoritative record per entity is required. The key is justifying the extra join cost with a concrete benefit.

What is an outrigger table in a snowflake schema?

An outrigger is a secondary dimension table that attaches to another dimension rather than directly to the fact table. In a snowflake schema, a dim_product table might have a product_category_fk pointing to a dim_product_category table. dim_product_category is the outrigger. Outriggers add query complexity and are generally avoided unless normalization is explicitly required.

Snowflake Schema FAQ

What is a snowflake schema?+
A snowflake schema is a data warehouse design where dimension tables are normalized into hierarchies of related tables. Instead of one flat customer dimension containing city, state, and country columns, you have a customer table, a city table, and a country table joined in sequence. The name comes from the snowflake-like diagram of branching tables.
How does a snowflake schema differ from a star schema?+
A star schema uses denormalized, flat dimension tables. A snowflake schema normalizes those dimensions into multiple related tables. Star schemas have fewer joins and are faster to query. Snowflake schemas use less storage and are easier to update when dimension hierarchies change.
When would you choose a snowflake schema over a star schema?+
Choose snowflake when dimension hierarchies change frequently (avoiding mass updates to denormalized rows), when dimension tables are very large with many repeated attribute combinations, or when storage costs are a primary concern. In most analytical workloads, star schema is preferred because query speed matters more than storage savings.
02 / Why practice

Practice Data Modeling

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Related Data Modeling Guides