DataDriven
LearnPracticeInterviewDiscussDaily
HelpContactPrivacyTermsSecurityiOS App

© 2026 DataDriven

Loading lesson...

  1. Home
  2. Learn
  3. Pre-Aggregation

Pre-Aggregation

Pre-computing answers before anyone asks

Pre-computing answers before anyone asks

Category
Data Modeling
Duration
23 minutes
Challenges
12 hands-on challenges

Topics covered: Why Pre-Aggregate?, Metric Types and Additivity, OLAP Cubes & Rollups, Granularity Design, Refresh Strategies & Materialized Views

Lesson Sections

  1. Why Pre-Aggregate? (concepts: dmPreAggregation)

    The Performance Problem A fact table with 500 million rows. A dashboard with 10 tiles. Each tile runs a GROUP BY query that scans the entire table. Ten full-table scans every time someone opens the dashboard. At $5 per TB scanned (BigQuery pricing), that is real money. At 30 seconds per query, that is a terrible user experience. Pre-aggregation solves this by computing the answer ahead of time. Instead of scanning 500 million rows to get daily revenue, create a daily_revenue summary table with 3

  2. Metric Types and Additivity (concepts: dmMetricAdditivity)

    Not Every Metric Can Be Pre-Aggregated This is the most important concept in pre-aggregation. Some metrics can be safely SUMmed across any dimension. Others cannot. If you pre-aggregate a non-additive metric, the numbers will be mathematically wrong and nobody will notice until someone reconciles against the source. The Average-of-Averages Trap You pre-compute average order value (AOV) per day: Monday = $50, Tuesday = $100. What is the weekly AOV? NOT ($50 + $100) / 2 = $75. If Monday had 1,000

  3. OLAP Cubes & Rollups (concepts: dmOlapCubes)

    Pre-Computing Every Dimension Combination An OLAP cube is a summary table that pre-computes aggregates for every combination of dimensions. Revenue by product. Revenue by region. Revenue by product AND region. Revenue by product AND region AND month. If you have 3 dimensions with 10 values each, the cube has 10 x 10 x 10 = 1,000 cells. Each cell stores a pre-computed SUM. Modern cloud warehouses do not use literal cubes (multidimensional arrays). They use flat summary tables with GROUP BY ROLLUP

  4. Granularity Design

    Choosing the Right Summary Level The granularity of your summary table determines both its size and its usefulness. A daily summary has 365 rows per year. An hourly summary has 8,760. A per-minute summary has 525,600. The finer the grain, the more flexible the queries, but the larger the table and the more it costs to refresh. The key question: what is the finest grain that any dashboard or report needs? If no dashboard filters below daily, a daily summary is sufficient. If the operations team n

  5. Refresh Strategies & Materialized Views

    A summary table is only useful if it stays current. The refresh strategy determines how often the summary is updated, how long it takes, and what happens when it fails. This is the operationally consequential part of pre-aggregation. Full Refresh vs Incremental The production pattern: incremental refresh daily, full refresh weekly as a reconciliation check. If the full refresh produces different numbers than the incremental, you have a bug in the incremental logic. The weekly full refresh catche

Related

  • All Lessons
  • Practice Problems
  • Mock Interview Practice
  • Daily Challenges