Data Modeling

Fact Table vs Dimension Table: Key Differences for Interviews

Here's the rule you want in your head before the phone rings: if you'd aggregate the column, it's a fact. If you'd filter or group by it, it's a dimension attribute. That's the whole game. You're going to be asked some version of this on almost every modeling round, and the candidates who hesitate on simple examples lose ground they never get back.

Don't worry about memorizing definitions. You'll find the concepts stick once you've walked through a few real tables out loud. Try the orders example on this page and say the grain sentence before you look at the answer.

71%

Qs Test Fact/Dim Split

191

Modeling Qs Analyzed

18%

Of DE Interviews Are Modeling

49

Onsite Modeling Rounds

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

Fact Table vs Dimension Table at a Glance

This table captures the core differences. In an interview, being able to rattle off these distinctions in 30 seconds shows immediate command of the topic.

PropertyFact TableDimension Table
PurposeStores measurable events (transactions, clicks, shipments)Stores descriptive context (who, what, where, when)
Column TypesForeign keys + numeric measures (revenue, quantity, duration)Surrogate key + text/categorical attributes (name, city, category)
Row CountVery high. Millions to billions of rows. Grows continuously.Low to moderate. Thousands to millions. Grows slowly.
ShapeTall and narrow (few columns, many rows)Short and wide (many columns, fewer rows)
Update FrequencyAppend-only. New rows added as events occur. Rarely updated.Updated when attributes change (SCD Type 1, 2, or 3).
Primary KeyComposite key (combination of dimension foreign keys) or row-level surrogateSingle surrogate key (system-generated integer)
AggregationMeasures are summed, averaged, or counted in queriesAttributes are used for GROUP BY, WHERE, and filter clauses

Fact Tables Explained

A fact table records business events as they happen. Every sale, every click, every shipment becomes one row. The columns in a fact table fall into two categories: foreign keys that point to dimension tables and numeric measures that quantify the event.

Grain is the thing to learn first. It's just one sentence that says what a single row represents. "One row per order line item." "One row per page view." Say it out loud before you list any columns. If you can't say it cleanly, you don't understand the table yet, and interviewers will catch that in about four seconds. Once you have the sentence, every measure and every foreign key either fits the grain or it doesn't.

Measures in a fact table have three classifications. Additive measures (revenue, quantity) can be summed across all dimensions. Semi-additive measures (account_balance) can be summed across some dimensions but not time. Non-additive measures (unit_price, ratio) cannot be summed and must be averaged or weighted. Getting this classification wrong leads to dashboards that display incorrect numbers.

Fact tables are the largest tables in a warehouse. A company with 10 million daily transactions generates 300 million fact rows per month and 3.6 billion per year. Partitioning by date is standard practice to keep queries fast.

Fact Table Examples

fact_orders

Grain: One row per order line item

Measures: quantity, unit_price, discount_amount, line_total, tax_amount

Foreign keys: date_key, customer_key, product_key, store_key, promotion_key

Volume: 50M rows/month for a mid-size retailer

fact_page_views

Grain: One row per page view event

Measures: time_on_page_seconds, scroll_depth_pct

Foreign keys: date_key, user_key, page_key, device_key, referrer_key

Volume: 500M rows/month for a SaaS product with 100K daily active users

fact_support_tickets

Grain: One row per support ticket

Measures: resolution_time_hours, customer_satisfaction_score, num_replies

Foreign keys: created_date_key, resolved_date_key, customer_key, agent_key, category_key

Volume: 200K rows/month for a B2B company with 5,000 customers

Dimension Tables Explained

A dimension table provides the descriptive context for events recorded in the fact table. When an analyst asks "revenue by product category," the product category comes from the dimension table. When they filter to "customers in California," the state value comes from the customer dimension.

Good dimension tables are richly attributed. A customer dimension with only customer_id and customer_name forces analysts to join other tables for basic filtering. A customer dimension with 25 columns (name, email, city, state, country, signup_date, acquisition_channel, tier, lifetime_value_bucket, industry, company_size) lets analysts answer dozens of questions without any extra joins.

Every dimension table has a surrogate key as its primary key: a system-generated integer with no business meaning. The fact table references this surrogate key, not the natural business key. Surrogate keys enable slowly changing dimension tracking, insulate the warehouse from source system ID changes, and perform better in joins because integers are smaller and faster to compare than strings.

Dimension tables change slowly. A customer might change their city once every few years. A product might change categories during a catalog reorganization. These changes are handled through slowly changing dimension (SCD) patterns: Type 1 (overwrite), Type 2 (new row with new surrogate key), or Type 3 (add a "previous" column).

Dimension Table Examples

dim_customer

Attributes: customer_name, email, city, state, country, signup_date, acquisition_channel, customer_tier, lifetime_order_count

Row count: 2.5 million rows for a consumer e-commerce company

Update pattern: SCD Type 2 on city and customer_tier. Type 1 on email.

dim_product

Attributes: product_name, sku, category, subcategory, brand, color, size, weight_kg, price_tier, is_active

Row count: 150,000 rows for a retailer with a large catalog

Update pattern: SCD Type 1 on price_tier and is_active. Type 2 on category if reclassifications need historical tracking.

dim_date

Attributes: full_date, day_of_week, month_name, quarter, year, fiscal_year, is_holiday, is_weekend, week_of_year

Row count: 36,525 rows (100 years of dates, preloaded)

Update pattern: Static. Loaded once and never modified. The only dimension that does not change.

How Fact and Dimension Tables Connect

The connection between fact and dimension tables is a foreign key relationship. The fact table contains one foreign key column for each dimension it references. A fact_orders table with five dimensions has five foreign key columns: date_key, customer_key, product_key, store_key, and promotion_key.

Each foreign key in the fact table matches the surrogate key (primary key) of the corresponding dimension table. When an analyst writes a query, they join the fact table to one or more dimension tables on these keys. The query "total revenue by product category for Q1 2026" joins fact_orders to dim_product (for category) and dim_date (for Q1 2026), filters on date range, groups by category, and sums line_total.

This pattern has a critical property: every dimension table connects only to the fact table, never to another dimension table. That is what makes it a star schema rather than a snowflake schema. The single-hop join from fact to dimension keeps queries simple and performance predictable.

Interview note

When drawing a star schema on a whiteboard, always show the foreign key columns in the fact table and draw arrows from the fact table to each dimension. The arrow direction (fact to dimension) shows the join relationship. Interviewers notice whether you draw this correctly.

5 Interview Questions on Fact vs Dimension Tables

These questions test whether you understand the distinction at a practical level, not just the textbook definition.

What is the difference between a fact table and a dimension table?

Fact tables store measurements: numeric values you aggregate (sum, average, count). Dimension tables store descriptive context: text and categorical values you filter and group by. Facts answer 'how much' and 'how many.' Dimensions answer 'who, what, where, when.' Facts are tall and narrow, growing with every event. Dimensions are short and wide, changing slowly. In a star schema, the fact table sits at the center with foreign keys pointing outward to dimensions. This question is testing whether you understand the fundamental building blocks of dimensional modeling.

Can a column be both a fact and a dimension attribute?

Yes, depending on context. Price is the classic example. Unit_price at the time of a sale is a fact measure: it applies to that specific transaction and you might average it across transactions. Standard_list_price is a dimension attribute on the product table: it describes the product and you would filter by it ('show me products priced above $100'). The same real-world concept maps to different table placements based on whether it is a measurement captured at event time or a descriptor that exists independent of any event.

Why do fact tables use surrogate keys from dimension tables instead of natural keys?

Three reasons. First, surrogate keys support slowly changing dimensions: when a customer moves cities, a new dimension row with a new surrogate key preserves the old address for historical facts. Second, surrogate keys insulate the warehouse from source system changes: if the CRM changes its customer ID format, only the ETL mapping changes, not the fact table. Third, integer surrogate keys are smaller and faster to join on than string natural keys like email addresses or SKU codes. A join on two 4-byte integers is measurably faster than a join on two varchar(255) columns across billions of fact rows.

How do you decide whether a new attribute goes in the fact or dimension table?

Apply the aggregation test. If the value would be meaningfully summed, averaged, or counted across rows, it belongs in the fact table. If the value would be used to filter rows or group results, it belongs in a dimension table. Quantity is always a fact (you sum it). Customer_name is always a dimension (you group by it). Some values are ambiguous: employee_count at a company could be a periodic snapshot measure (fact) or a descriptor of the company (dimension). When ambiguous, ask what the analyst will do with it. That determines placement.

What is a factless fact table?

A factless fact table has no numeric measures. It records events that either happened or did not happen. The classic example is student attendance: each row represents a student attending a class on a date. The foreign keys are student_key, class_key, and date_key. There are no measures to sum or average. The value comes from counting rows and from identifying the absence of expected rows (which students were enrolled but did not attend?). Factless fact tables are a Kimball pattern that tests whether candidates understand that the 'fact' in a fact table refers to the event being recorded, not necessarily to numeric columns.

Frequently Asked Questions

What is a fact table in simple terms?+
A fact table records measurable business events. Think of it as a log of things that happened: sales, clicks, shipments, logins. Each row is one event. The table contains numeric measures (how much revenue, how many items, how long the session lasted) and foreign keys pointing to dimension tables that describe the context of each event. Fact tables are the largest tables in a data warehouse, often containing billions of rows.
What is a dimension table in simple terms?+
A dimension table describes the context around business events. It answers who was involved, what product was purchased, where the transaction happened, and when it occurred. Dimension tables contain descriptive text columns like customer_name, product_category, and store_city. They are much smaller than fact tables and change infrequently. Each dimension table has a surrogate key that the fact table references through a foreign key.
How are fact and dimension tables connected?+
They connect through foreign keys. The fact table contains a foreign key column for each dimension it references. For example, a fact_orders table has customer_key, product_key, date_key, and store_key columns. Each of these keys matches the primary key (surrogate key) of the corresponding dimension table. This structure creates the star schema pattern: the fact table at the center with dimension tables around it.
Can a table be both a fact table and a dimension table?+
In rare cases, yes. An accumulating snapshot fact table that tracks milestones (like order fulfillment with order_date, ship_date, delivery_date) can serve as a dimension for another fact table. For example, a fact_returns table might reference the fact_order_fulfillment table to get the original order context. But this is uncommon. In standard star schema design, fact tables and dimension tables serve distinct roles and are designed separately.

Say the Grain Out Loud. Then Ship the Design.

You'll get better faster if you practice on real prompts instead of flashcards. Pick any business event on DataDriven, declare the grain in one sentence, and design the table before the timer hits five minutes.

Start Practicing