Here's the rule you want in your head before the phone rings: if you'd aggregate the column, it's a fact. If you'd filter or group by it, it's a dimension attribute. That's the whole game. You're going to be asked some version of this on almost every modeling round, and the candidates who hesitate on simple examples lose ground they never get back.
Don't worry about memorizing definitions. You'll find the concepts stick once you've walked through a few real tables out loud. Try the orders example on this page and say the grain sentence before you look at the answer.
Qs Test Fact/Dim Split
Modeling Qs Analyzed
Of DE Interviews Are Modeling
Onsite Modeling Rounds
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
This table captures the core differences. In an interview, being able to rattle off these distinctions in 30 seconds shows immediate command of the topic.
| Property | Fact Table | Dimension Table |
|---|---|---|
| Purpose | Stores measurable events (transactions, clicks, shipments) | Stores descriptive context (who, what, where, when) |
| Column Types | Foreign keys + numeric measures (revenue, quantity, duration) | Surrogate key + text/categorical attributes (name, city, category) |
| Row Count | Very high. Millions to billions of rows. Grows continuously. | Low to moderate. Thousands to millions. Grows slowly. |
| Shape | Tall and narrow (few columns, many rows) | Short and wide (many columns, fewer rows) |
| Update Frequency | Append-only. New rows added as events occur. Rarely updated. | Updated when attributes change (SCD Type 1, 2, or 3). |
| Primary Key | Composite key (combination of dimension foreign keys) or row-level surrogate | Single surrogate key (system-generated integer) |
| Aggregation | Measures are summed, averaged, or counted in queries | Attributes are used for GROUP BY, WHERE, and filter clauses |
A fact table records business events as they happen. Every sale, every click, every shipment becomes one row. The columns in a fact table fall into two categories: foreign keys that point to dimension tables and numeric measures that quantify the event.
Grain is the thing to learn first. It's just one sentence that says what a single row represents. "One row per order line item." "One row per page view." Say it out loud before you list any columns. If you can't say it cleanly, you don't understand the table yet, and interviewers will catch that in about four seconds. Once you have the sentence, every measure and every foreign key either fits the grain or it doesn't.
Measures in a fact table have three classifications. Additive measures (revenue, quantity) can be summed across all dimensions. Semi-additive measures (account_balance) can be summed across some dimensions but not time. Non-additive measures (unit_price, ratio) cannot be summed and must be averaged or weighted. Getting this classification wrong leads to dashboards that display incorrect numbers.
Fact tables are the largest tables in a warehouse. A company with 10 million daily transactions generates 300 million fact rows per month and 3.6 billion per year. Partitioning by date is standard practice to keep queries fast.
Grain: One row per order line item
Measures: quantity, unit_price, discount_amount, line_total, tax_amount
Foreign keys: date_key, customer_key, product_key, store_key, promotion_key
Volume: 50M rows/month for a mid-size retailer
Grain: One row per page view event
Measures: time_on_page_seconds, scroll_depth_pct
Foreign keys: date_key, user_key, page_key, device_key, referrer_key
Volume: 500M rows/month for a SaaS product with 100K daily active users
Grain: One row per support ticket
Measures: resolution_time_hours, customer_satisfaction_score, num_replies
Foreign keys: created_date_key, resolved_date_key, customer_key, agent_key, category_key
Volume: 200K rows/month for a B2B company with 5,000 customers
A dimension table provides the descriptive context for events recorded in the fact table. When an analyst asks "revenue by product category," the product category comes from the dimension table. When they filter to "customers in California," the state value comes from the customer dimension.
Good dimension tables are richly attributed. A customer dimension with only customer_id and customer_name forces analysts to join other tables for basic filtering. A customer dimension with 25 columns (name, email, city, state, country, signup_date, acquisition_channel, tier, lifetime_value_bucket, industry, company_size) lets analysts answer dozens of questions without any extra joins.
Every dimension table has a surrogate key as its primary key: a system-generated integer with no business meaning. The fact table references this surrogate key, not the natural business key. Surrogate keys enable slowly changing dimension tracking, insulate the warehouse from source system ID changes, and perform better in joins because integers are smaller and faster to compare than strings.
Dimension tables change slowly. A customer might change their city once every few years. A product might change categories during a catalog reorganization. These changes are handled through slowly changing dimension (SCD) patterns: Type 1 (overwrite), Type 2 (new row with new surrogate key), or Type 3 (add a "previous" column).
Attributes: customer_name, email, city, state, country, signup_date, acquisition_channel, customer_tier, lifetime_order_count
Row count: 2.5 million rows for a consumer e-commerce company
Update pattern: SCD Type 2 on city and customer_tier. Type 1 on email.
Attributes: product_name, sku, category, subcategory, brand, color, size, weight_kg, price_tier, is_active
Row count: 150,000 rows for a retailer with a large catalog
Update pattern: SCD Type 1 on price_tier and is_active. Type 2 on category if reclassifications need historical tracking.
Attributes: full_date, day_of_week, month_name, quarter, year, fiscal_year, is_holiday, is_weekend, week_of_year
Row count: 36,525 rows (100 years of dates, preloaded)
Update pattern: Static. Loaded once and never modified. The only dimension that does not change.
The connection between fact and dimension tables is a foreign key relationship. The fact table contains one foreign key column for each dimension it references. A fact_orders table with five dimensions has five foreign key columns: date_key, customer_key, product_key, store_key, and promotion_key.
Each foreign key in the fact table matches the surrogate key (primary key) of the corresponding dimension table. When an analyst writes a query, they join the fact table to one or more dimension tables on these keys. The query "total revenue by product category for Q1 2026" joins fact_orders to dim_product (for category) and dim_date (for Q1 2026), filters on date range, groups by category, and sums line_total.
This pattern has a critical property: every dimension table connects only to the fact table, never to another dimension table. That is what makes it a star schema rather than a snowflake schema. The single-hop join from fact to dimension keeps queries simple and performance predictable.
When drawing a star schema on a whiteboard, always show the foreign key columns in the fact table and draw arrows from the fact table to each dimension. The arrow direction (fact to dimension) shows the join relationship. Interviewers notice whether you draw this correctly.
These questions test whether you understand the distinction at a practical level, not just the textbook definition.
Fact tables store measurements: numeric values you aggregate (sum, average, count). Dimension tables store descriptive context: text and categorical values you filter and group by. Facts answer 'how much' and 'how many.' Dimensions answer 'who, what, where, when.' Facts are tall and narrow, growing with every event. Dimensions are short and wide, changing slowly. In a star schema, the fact table sits at the center with foreign keys pointing outward to dimensions. This question is testing whether you understand the fundamental building blocks of dimensional modeling.
Yes, depending on context. Price is the classic example. Unit_price at the time of a sale is a fact measure: it applies to that specific transaction and you might average it across transactions. Standard_list_price is a dimension attribute on the product table: it describes the product and you would filter by it ('show me products priced above $100'). The same real-world concept maps to different table placements based on whether it is a measurement captured at event time or a descriptor that exists independent of any event.
Three reasons. First, surrogate keys support slowly changing dimensions: when a customer moves cities, a new dimension row with a new surrogate key preserves the old address for historical facts. Second, surrogate keys insulate the warehouse from source system changes: if the CRM changes its customer ID format, only the ETL mapping changes, not the fact table. Third, integer surrogate keys are smaller and faster to join on than string natural keys like email addresses or SKU codes. A join on two 4-byte integers is measurably faster than a join on two varchar(255) columns across billions of fact rows.
Apply the aggregation test. If the value would be meaningfully summed, averaged, or counted across rows, it belongs in the fact table. If the value would be used to filter rows or group results, it belongs in a dimension table. Quantity is always a fact (you sum it). Customer_name is always a dimension (you group by it). Some values are ambiguous: employee_count at a company could be a periodic snapshot measure (fact) or a descriptor of the company (dimension). When ambiguous, ask what the analyst will do with it. That determines placement.
A factless fact table has no numeric measures. It records events that either happened or did not happen. The classic example is student attendance: each row represents a student attending a class on a date. The foreign keys are student_key, class_key, and date_key. There are no measures to sum or average. The value comes from counting rows and from identifying the absence of expected rows (which students were enrolled but did not attend?). Factless fact tables are a Kimball pattern that tests whether candidates understand that the 'fact' in a fact table refers to the event being recorded, not necessarily to numeric columns.
You'll get better faster if you practice on real prompts instead of flashcards. Pick any business event on DataDriven, declare the grain in one sentence, and design the table before the timer hits five minutes.
Start Practicing