Fact Table vs Dimension Table: Key Differences Explained
If you would aggregate the column, it is a fact. If you would filter or group by it, it is a dimension attribute. That is the whole game. Understand grain, measures, and surrogate keys, and you will answer this class of question cleanly on every modeling round.
Fact Tables vs Dimension Tables
The rule to internalize before any modeling round: if you would aggregate the column, it is a fact. If you would filter or group by it, it is a dimension attribute. That is the whole game. You will be asked some version of this on almost every modeling round, and the candidates who hesitate on simple examples lose ground they never get back.
Fact tables store measurable events at a stated grain. Dimension tables store descriptive context around those events. Facts are tall and narrow, growing with every transaction. Dimensions are short and wide, changing slowly over time.
Know Fact vs Dimension the way the interviewer who asks it knows it.
Fact Table vs Dimension Table at a Glance
| Property | Fact Table | Dimension Table |
|---|---|---|
| Purpose | Stores measurable events (transactions, clicks, shipments) | Stores descriptive context (who, what, where, when) |
| Column Types | Foreign keys + numeric measures (revenue, quantity, duration) | Surrogate key + text/categorical attributes (name, city, category) |
| Row Count | Very high. Millions to billions of rows. Grows continuously. | Low to moderate. Thousands to millions. Grows slowly. |
| Shape | Tall and narrow (few columns, many rows) | Short and wide (many columns, fewer rows) |
| Update Frequency | Append-only. New rows added as events occur. Rarely updated. | Updated when attributes change (SCD Type 1, 2, or 3). |
| Primary Key | Composite key (combination of dimension foreign keys) or row-level surrogate | Single surrogate key (system-generated integer) |
| Aggregation | Measures are summed, averaged, or counted in queries | Attributes are used for GROUP BY, WHERE, and filter clauses |
Fact Table Examples
fact_orders
Grain: One row per order line item Measures: quantity, unit_price, discount_amount, line_total, tax_amount Foreign keys: date_key, customer_key, product_key, store_key, promotion_key Volume: 50M rows/month for a mid-size retailer
fact_page_views
Grain: One row per page view event Measures: time_on_page_seconds, scroll_depth_pct Foreign keys: date_key, user_key, page_key, device_key, referrer_key Volume: 500M rows/month for a SaaS product with 100K daily active users
fact_support_tickets
Grain: One row per support ticket Measures: resolution_time_hours, customer_satisfaction_score, num_replies Foreign keys: created_date_key, resolved_date_key, customer_key, agent_key, category_key Volume: 200K rows/month for a B2B company with 5,000 customers
Dimension Table Examples
dim_customer
Attributes: customer_name, email, city, state, country, signup_date, acquisition_channel, customer_tier, lifetime_order_count Row count: 2.5 million rows for a consumer e-commerce company Update pattern: SCD Type 2 on city and customer_tier. Type 1 on email.
dim_product
Attributes: product_name, sku, category, subcategory, brand, color, size, weight_kg, price_tier, is_active Row count: 150,000 rows for a retailer with a large catalog Update pattern: SCD Type 1 on price_tier and is_active. Type 2 on category if reclassifications need historical tracking.
dim_date
Attributes: full_date, day_of_week, month_name, quarter, year, fiscal_year, is_holiday, is_weekend, week_of_year Row count: 36,525 rows (100 years of dates, preloaded) Update pattern: Static. Loaded once and never modified. The only dimension that does not change.
5 Interview Questions on Fact vs Dimension Tables
What is the difference between a fact table and a dimension table?
Fact tables store measurements: numeric values you aggregate (sum, average, count). Dimension tables store descriptive context: text and categorical values you filter and group by. Facts answer 'how much' and 'how many.' Dimensions answer 'who, what, where, when.' Facts are tall and narrow, growing with every event. Dimensions are short and wide, changing slowly. In a star schema, the fact table sits at the center with foreign keys pointing outward to dimensions.
Can a column be both a fact and a dimension attribute?
Yes, depending on context. Price is the classic example. unit_price at the time of a sale is a fact measure: it applies to that specific transaction and you might average it across transactions. standard_list_price is a dimension attribute on the product table: it describes the product and you would filter by it ('show me products priced above $100'). The same real-world concept maps to different table placements based on whether it is a measurement captured at event time or a descriptor that exists independent of any event.
Why do fact tables use surrogate keys from dimension tables instead of natural keys?
Three reasons. First, surrogate keys support slowly changing dimensions: when a customer moves cities, a new dimension row with a new surrogate key preserves the old address for historical facts. Second, surrogate keys insulate the warehouse from source system changes: if the CRM changes its customer ID format, only the ETL mapping changes, not the fact table. Third, integer surrogate keys are smaller and faster to join on than string natural keys like email addresses or SKU codes. A join on two 4-byte integers is measurably faster than a join on two varchar(255) columns across billions of fact rows.
How do you decide whether a new attribute goes in the fact or dimension table?
Apply the aggregation test. If the value would be meaningfully summed, averaged, or counted across rows, it belongs in the fact table. If the value would be used to filter rows or group results, it belongs in a dimension table. Quantity is always a fact (you sum it). customer_name is always a dimension (you group by it). Some values are ambiguous: employee_count at a company could be a periodic snapshot measure (fact) or a descriptor of the company (dimension). When ambiguous, ask what the analyst will do with it. That determines placement.
What is a factless fact table?
A factless fact table has no numeric measures. It records events that either happened or did not happen. The classic example is student attendance: each row represents a student attending a class on a date. The foreign keys are student_key, class_key, and date_key. There are no measures to sum or average. The value comes from counting rows and from identifying the absence of expected rows (which students were enrolled but did not attend?). Factless fact tables test whether candidates understand that the 'fact' in a fact table refers to the event being recorded, not necessarily to numeric columns.
Frequently Asked Questions
What is a fact table in simple terms?+
What is a dimension table in simple terms?+
How are fact and dimension tables connected?+
Can a table be both a fact table and a dimension table?+
Say the Grain Out Loud. Then Ship the Design.
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition