Why Pre-Aggregate?
Concepts covered: dmPreAggregation
The Performance Problem A fact table with 500 million rows. A dashboard with 10 tiles. Each tile runs a GROUP BY query that scans the entire table. Ten full-table scans every time someone opens the dashboard. At $5 per TB scanned (BigQuery pricing), that is real money. At 30 seconds per query, that is a terrible user experience. Pre-aggregation solves this by computing the answer ahead of time. Instead of scanning 500 million rows to get daily revenue, create a daily_revenue summary table with 365 rows per year. The dashboard reads 365 rows instead of 500 million. Load time drops from 30 seconds to milliseconds. The Tradeoff Pre-aggregation is a deliberate tradeoff between freshness and performance. The summary table is only as current as its last refresh. If the daily_revenue table is ref
About This Interactive Section
This section is part of the Pre-Aggregation lesson on DataDriven, a free data engineering interview prep platform. Each section includes explanations, worked examples, and hands-on code challenges that execute in real time. SQL queries run against a live PostgreSQL database. Python runs in a sandboxed Docker container. Data modeling problems validate against interactive schema canvases. All content is framed around what data engineering interviewers actually test at companies like Meta, Google, Amazon, Netflix, Stripe, and Databricks.
How DataDriven Lessons Work
DataDriven combines four interview rounds (SQL, Python, Data Modeling, Pipeline Architecture) with adaptive difficulty and spaced repetition. Easy problems get harder as you improve. Weak concepts resurface until you master them. Your readiness score tracks progress across every topic interviewers test. Every lesson section ends with problems you solve by writing and running real code, not by picking multiple-choice answers.