DataDriven
LearnPracticeInterviewDiscussDaily
HelpContactPrivacyTermsSecurityiOS App

© 2026 DataDriven

Loading lesson...

  1. Home
  2. Learn
  3. Aggregating: Intermediate

Aggregating: Intermediate

AVG lied to you. Meet the percentiles.

AVG lied to you. Meet the percentiles.

Category
SQL
Difficulty
intermediate
Duration
33 minutes
Challenges
0 hands-on challenges

Topics covered: Row-level vs aggregate behavior, Understanding cardinality, APPROX_DISTINCT (HyperLogLog), ORDER BY with aggregates, APPROX_PERCENTILE and ARBITRARY

Lesson Sections

  1. Row-level vs aggregate behavior

    Understanding Differences Row-level and aggregate operations follow different execution models. Grasping this distinction is key to writing correct SQL. Two Worlds of SQL SQL has two fundamentally different kinds of operations: How They Work Together Here's the key insight: row-level operations happen first, for each row individually. Then aggregate functions collect those results and summarize them. Consider this query: Here's the execution order: CASE in Aggregates Patterns and Pitfalls Unders

  2. Understanding cardinality

    Cardinality means the number of unique values in a column. It's a fundamental concept for understanding your data and optimizing queries. Understanding cardinality helps you: Try counting distinct customers per region and sorting to find the busiest regions first.

  3. APPROX_DISTINCT (HyperLogLog) (concepts: sqlApproxAgg)

    The Scaling Problem Exact distinct counts become expensive as data grows. Understanding the cost helps you choose the right approach. COUNT(DISTINCT) Problem To count distinct values exactly, the database must track every unique value it has seen. This requires building a hash set in memory that grows with cardinality. For a table with 1 billion rows and 100 million unique values, that hash set can consume gigabytes of memory. How APPROX_DISTINCT Works The approximate counts are within 0.3% of e

  4. ORDER BY with aggregates

    Basic Sorting Patterns Sorting by Aggregates The most common use case: sort groups by their aggregated metric to find the top or bottom performers. Using Column Aliases Top-N Queries LIMIT with ORDER BY This shows your top 5 best-selling products by units sold. Perfect for dashboards, reports, and quick insights. Execution Order

  5. APPROX_PERCENTILE and ARBITRARY

    Averages can be misleading. If one customer spends $100,000 and ninety-nine customers spend $100 each, the average is $1,089.90, which is far higher than what a typical customer spends. Percentiles tell you what typical really looks like. A percentile tells you what value a certain percentage of data falls below. The 50th percentile (median) is where half the values are below and half are above. The 95th percentile tells you the value that 95% of data falls below, which is useful for understandi

Related

  • All Lessons
  • Practice Problems
  • Mock Interview Practice
  • Daily Challenges