DataDriven
LearnPracticeInterviewDiscussDaily
HelpContactPrivacyTermsSecurityiOS App

© 2026 DataDriven

Loading lesson...

  1. Home
  2. Learn
  3. Star Schemas

Star Schemas

Stars, snowflakes, and facts between

Stars, snowflakes, and facts between

Category
Data Modeling
Duration
30 minutes
Challenges
12 hands-on challenges

Topics covered: The Star Schema, Types of Fact Tables, Types of Dimensions, Defining the Grain, Surrogate Keys

Lesson Sections

  1. The Star Schema (concepts: dmStarSchema, dmSnowflakeSchema)

    Facts in the Center, Dimensions Around Them A star schema has one table in the middle (the fact table) surrounded by several tables around it (dimension tables). The fact table stores measurable events: a sale happened, a click occurred, a payment was processed. Dimension tables store the descriptive context: who (customer), what (product), when (date), where (store). The fact table is tall and narrow: billions of rows, each with a few FK columns pointing to dimensions plus a few numeric measure

  2. Types of Fact Tables (concepts: dmFactTables)

    There are three fundamental types of fact tables. Each models a different kind of business process. Choosing the wrong type means your table cannot answer the questions it was built for. Transaction Facts One row per discrete business event: a sale, a click, a payment. Transaction facts are insert-only. Rows are never updated. They are the most common fact type and the right choice for any event-level data. Transaction facts cannot answer 'what is the current state?' questions. They record event

  3. Types of Dimensions (concepts: dmDimensionTables)

    Dimensions are the descriptive lookup tables that give meaning to fact rows. A fact row says 'customer_sk = 42, amount = $100.' The dimension tells you customer 42 is 'Alice Zhang from Seattle in the Enterprise segment.' Without dimensions, facts are just numbers. Conformed Dimensions A conformed dimension is shared across multiple fact tables. dim_customer is referenced by fact_sales, fact_returns, fact_support_tickets. The same customer_sk and the same attribute definitions. This is what makes

  4. Defining the Grain (concepts: dmGrainDefinition)

    The grain is the most important decision in dimensional modeling. It defines what one row in the fact table represents. 'One row per order line item.' 'One row per click event.' 'One row per daily account balance.' Everything else follows from this: which dimensions make sense, which measures are valid, which queries produce correct results. How to State the Grain A grain statement is precise enough that another engineer can write a uniqueness test against it. 'One row per order' is vague. 'One

  5. Surrogate Keys

    Surrogate keys are system-generated identifiers that replace natural keys in dimensional models. Every dimension row gets its own surrogate key (customer_sk, product_sk, date_sk). Fact tables reference dimensions using these surrogate keys, not the natural business keys. Why Not Just Use Natural Keys? Natural keys (email, SKU, employee_id) come from source systems. They have three problems in analytical models: they change (email updates), they get reused (SKU recycled for a new product), and th

Related

  • All Lessons
  • Practice Problems
  • Mock Interview Practice
  • Daily Challenges