Aggregating: Beginner

Amazon lists over 350 million products, and every seller dashboard metric you see, from units sold to average star rating to revenue by category, is produced by aggregate SQL queries running across billions of order records. When Amazon's retail team identifies that a product category is underperforming, they are looking at COUNT, SUM, and AVG applied across millions of rows grouped by category, marketplace, and time window. The same functions that power those eight-figure business decisions are the ones you are about to learn: COUNT, SUM, AVG, MIN, MAX, and GROUP BY.

GROUP BY for categorizing data

Daily Life

Interviews

Group rows into meaningful categories

GROUP BY collapses individual rows into summary groups. Without it, you can count every order in your database but not how many belong to each customer. That gap is what GROUP BY closes.

THE CASE OF THE COLLAPSED RECORDS

Lisa Park flagged a summary report. Twelve orders in the system. The report shows four rows. Nobody deleted anything. But eight orders have vanished from the page. She has a board meeting in two hours.

orders

order_id	category	customer	amount
ORD-101	Electronics	Chen	$1,200
ORD-102	Clothing	Patel	$650
ORD-103	Books	Kim	$150
ORD-104	Electronics	Ruiz	$2,500
ORD-105	Home	Osei	$1,800
ORD-106	Clothing	Tanaka	$800
ORD-107	Books	Chen	$200
ORD-108	Electronics	Kim	$3,200
ORD-109	Home	Patel	$1,300
ORD-110	Clothing	Osei	$500
ORD-111	Books	Ruiz	$100
ORD-112	Electronics	Tanaka	$1,300

GROUP BY is the organizing principle behind aggregation. It tells SQL: "Put all rows with the same value into the same bucket, then calculate something for each bucket."

Without GROUP BY, an aggregate function operates on the entire table and returns a single row. With GROUP BY, you get one row per unique value in the grouping column.

Understanding GROUP BY

GROUP BY transforms your data by organizing rows into buckets based on shared values, then summarizing each bucket independently.

How GROUP BY Works

Consider this sales data:

sale_id	region	amount
1	North	150
2	South	200
3	North	175
4	South	225
5	North	190
6	East	160

Watch how rows with the same region value are gathered into groups, then collapsed into a single summary row per group.

	SELECT
	region,
	COUNT(*) AS sale_count
	FROM transactions
	GROUP BY region

SELECT
  region,
  COUNT(*) AS sale_count
FROM transactions
GROUP BY region

transactions

sale_id

region

amount

North

150

South

200

North

175

South

225

North

190

East

160

→

result

region

sale_count

empty

GROUP BY region: every row lands in exactly one bucket, and COUNT(*) updates as its group receives rows.

1/8

When you GROUP BY region, SQL creates three buckets: one for "North" (3 rows), one for "South" (2 rows), and one for "East" (1 row). Each bucket is then summarized independently.

	SELECT
	region,
	COUNT(*) AS sale_count
	FROM transactions
	GROUP BY region

Result

region	sale_count
North	3
South	2
East	1

The result has one row per region. Each row summarizes all the sales in that region. The original 6 rows have been collapsed into 3 summary rows.

Rules and Patterns

GROUP BY has strict rules about what you can select. Understanding these rules prevents common errors and helps you write correct queries.

GROUP BY Golden Rule

When you use GROUP BY, every column in your SELECT must either:

1. Be in the GROUP BY clause, OR 2. Be inside an aggregate function (COUNT, SUM, AVG, etc.)

This rule exists because SQL needs to know how to handle columns that have multiple values per group. If you group by region, what should SQL show for sale_id? There are multiple sale_ids in each region.

SQL cannot pick one arbitrarily, so you must tell it what to do: count them, sum them, take the max, etc.

This query will fail because sale_id is not aggregated and not in GROUP BY.

	SELECT
	region,
	sale_id,
	COUNT(*)
	FROM transactions
	GROUP BY region

TIP

If your query fails with an error about columns not being in GROUP BY, check that every non-aggregated column appears in your GROUP BY clause.

Multiple Column Grouping

You can group by multiple columns to create more specific buckets. Each unique combination of values becomes its own group.

	SELECT
	region,
	product_type,
	COUNT(*) AS product_count
	FROM transactions
	GROUP BY region, product_type

Result

region	product_type	product_count
North	Electronics	3
North	Clothing	5
South	Electronics	4
South	Clothing	2
East	Electronics	3

Now you see sales broken down by both region AND product type. "North Electronics" is a separate bucket from "North Clothing".

Database Execution

Understanding the execution sequence helps you write efficient GROUP BY queries.

How GROUP BY Executes

GROUP BY executes after WHERE but before SELECT
Rows are sorted into buckets by grouping column values
Aggregate functions calculate one result per bucket
The result has one row per unique group

> Complete this query to find how many products exist in each category.

SELECT
  ___,
  ___(*) AS num_products
FROM products
___ category

category

COUNT

GROUP BY

SUM

ORDER BY

GROUP BY is the single most important clause in analytical SQL. Every dashboard, report, and summary table ultimately relies on it to collapse raw rows into meaningful categories.

The golden rule is simple: every column in SELECT must either be in GROUP BY or wrapped in an aggregate function. Violating this rule produces an error, and rightly so, because SQL needs to know how to handle multi-valued columns.

TIP

When a GROUP BY query returns more rows than expected, you are probably grouping on a higher-cardinality column than intended. Check whether you need to combine columns or use a coarser grouping.

COUNT variations (*,col,DISTINCT)

Daily Life

Interviews

Count rows, values, or unique entries

COUNT() is the simplest and most frequently used aggregate function. It counts things. Depending on how you use it, you can count rows, count non-empty values, or count unique values.

COUNT Variations

COUNT has three forms, each answering a different question about your data. Choosing the right one depends on what you need to measure.

COUNT(*) - Count All Rows

COUNT(*) counts every row in the group, regardless of what values are in those rows. It counts rows where columns are empty and rows where columns have values.

	SELECT
	COUNT(*) AS total_orders
	FROM orders

Result

total_orders
1847

With GROUP BY, COUNT(*) tells you how many rows are in each group:

	SELECT
	status,
	COUNT(*) AS order_count
	FROM orders
	GROUP BY status

Result

status	order_count
completed	1203
pending	412
cancelled	232

COUNT(column)

COUNT(column_name) counts only rows where that specific column is not empty (not NULL). This is useful when some rows have missing data.

Consider a table where some customers have not provided an email:

customer_id	name	email
1	Alice	alice@email.com
2	Bob	NULL
3	Carol	carol@email.com
4	Dave	NULL
5	Eve	eve@email.com

	SELECT
	COUNT(*) AS total_customers,
	COUNT(email) AS customers_with_email
	FROM customers

Result

total_customers	customers_with_email
5	3

COUNT(*) returns 5 (all rows). COUNT(email) returns 3 (only rows with an email). The NULL entries represent missing values which COUNT(column) skips.

COUNT(DISTINCT)

Adding DISTINCT inside COUNT tells SQL to count only unique values, ignoring duplicates.

	SELECT
	COUNT(*) AS total_orders,
	COUNT(DISTINCT customer_id) AS unique_customers
	FROM orders

Result

total_orders	unique_customers
1847	892

1,847 orders came from 892 unique customers. This tells you that some customers placed multiple orders.

Practical Applications

Knowing when to use each COUNT variant helps you answer different business questions accurately.

Comparing COUNT Variants

The three variants of COUNT answer different questions. Choose the right one for what you need to measure:

COUNT(*) counts every row in each group, including rows with NULL values. Use this when you want the total number of records.

	SELECT
	region,
	COUNT(*) AS total_rows
	FROM orders
	GROUP BY region

Database Execution

Each COUNT variant has different performance characteristics based on the work required.

COUNT(*)

Extremely fast: only counts rows, no value checks needed

COUNT(column)

Checks each value for NULL before counting

COUNT(DISTINCT)

Builds a set of unique values, then counts the set

Return type

COUNT always returns an integer, never NULL

> Complete this query to count orders and find unique customers per region.

SELECT
  region,
  COUNT(*) AS total_orders,
  ___(___ customer_id) AS unique_customers
FROM orders
GROUP BY region

COUNT

DISTINCT

SUM

AVG

COUNT(*), COUNT(column), and COUNT(DISTINCT column) each answer a different question. Choosing the right variant prevents subtle errors in data quality reports and business metrics.

A common real-world use case is audit reporting: COUNT(*) gives total records, COUNT(email) gives records with a value, and COUNT(DISTINCT customer_id) gives unique entities, all in one query.

TIP

When a COUNT result surprises you, ask whether NULLs or duplicates might explain it. COUNT(*) vs COUNT(column) vs COUNT(DISTINCT) each handle those cases differently.

SUM and AVG calculations

Daily Life

Interviews

Calculate totals and averages per group

SUM() adds up all the values in a column. It is used for any question involving totals: total revenue, total quantity, total hours, total anything numeric.

SUM Functions

SUM adds up numeric values to produce totals. It works with or without GROUP BY, calculating grand totals or group-level totals.

Basic Example

The simplest SUM query adds up every value in a column across the entire table.

	SELECT
	SUM(amount) AS total_revenue
	FROM transactions

Result

total_revenue
2847593.50

Without GROUP BY, SUM adds up every row in the table, giving you a grand total.

SUM with GROUP BY

The real power of SUM appears when combined with GROUP BY. Now you can see totals broken down by category.

	SELECT
	region,
	SUM(amount) AS total_revenue
	FROM transactions
	GROUP BY region

SELECT
  region,
  SUM(amount) AS total_amount
FROM transactions
GROUP BY region

transactions

sale_id

region

amount

North

150

South

200

North

175

South

225

North

190

East

160

→

result

region

total_amount

empty

GROUP BY region: every row lands in exactly one bucket, and SUM(amount) updates as its group receives rows.

1/8

Each group's amounts are added together into a single total. North's three sales (150 + 175 + 190) become 515, while East's lone sale stays at 160.

	SELECT
	region,
	SUM(amount) AS total_revenue,
	SUM(quantity) AS units_sold
	FROM transactions
	GROUP BY region

Result

region	total_revenue	units_sold
North	892450.00	4521
South	1123000.50	5892
East	532143.00	2876
West	300000.00	1543

Now you can instantly see which region generates the most revenue and sells the most units.

SUM with Expressions

You can sum calculated values, not just raw columns. This is common for computing totals from unit prices and quantities.

	SELECT
	category,
	SUM(price * quantity) AS total_value,
	SUM(price * quantity * 0.1) AS estimated_tax
	FROM order_items
	GROUP BY category

Result

category	total_value	estimated_tax
Electronics	245000.00	24500.00
Clothing	89000.00	8900.00
Home	156000.00	15600.00

The expression price * quantity is calculated for each row, then those products are summed together.

Grand vs Group Totals

SUM behaves differently depending on whether you include GROUP BY. Compare the two approaches:

Without GROUP BY, SUM adds every row in the table and returns a single grand total. The entire table is treated as one group.

	SELECT
	SUM(amount) AS total_revenue
	FROM transactions

Handling Empty Data

SUM ignores NULL values. If a column has some NULL entries, they are simply skipped in the calculation. If ALL values are NULL, SUM returns NULL (not zero).

Database Execution

SUM has specific behaviors around NULL values and data types that affect your results.

•SUM behavior

Iterates through each row in the group
NULL values are skipped (not treated as zero)
Only works on numeric columns

•Watch out for

Returns NULL if all values are NULL
Integer overflow on very large sums
Text columns cause errors

AVG Functions

AVG calculates arithmetic means to find typical values. Understanding how it handles NULLs is crucial for accurate analysis.

AVG()

AVG() calculates the arithmetic mean: sum of all values divided by the count of values. It is essential for understanding typical values: average order size, average response time, average salary.

Basic Example

AVG without GROUP BY calculates the mean across all rows in the table.

	SELECT
	AVG(amount) AS average_order_value
	FROM orders

Result

average_order_value
127.43

The average order value across all orders is $127.43.

AVG with GROUP BY

When combined with GROUP BY, AVG calculates a separate mean for each group.

	SELECT
	department,
	AVG(salary) AS avg_salary
	FROM employee_metrics
	GROUP BY department

SELECT
  department,
  AVG(salary) AS avg_salary
FROM employee_metrics
GROUP BY department

employee_metrics

emp_id

department

salary

Engineering

120000

Sales

80000

Engineering

130000

Sales

90000

Engineering

125000

Marketing

95000

→

result

department

avg_salary

empty

GROUP BY department: every row lands in exactly one bucket, and AVG(salary) updates as its group receives rows.

1/8

Each group's salaries are averaged independently. Engineering's three salaries produce a higher mean than Sales or Marketing, revealing the pay disparity across departments.

	SELECT
	department,
	AVG(salary) AS avg_salary,
	COUNT(*) AS employee_count
	FROM employee_metrics
	GROUP BY department

Result

department	avg_salary	employee_count
Engineering	125000.00	45
Sales	85000.00	32
Marketing	92000.00	18
Support	65000.00	28

This reveals that Engineering has the highest average salary and the most employees.

AVG vs Manual Calculation

AVG() is equivalent to SUM() / COUNT(), but there is a subtle difference with NULL values:

TIP

AVG() divides by the count of non-NULL values only. If you have 10 rows but 2 are NULL, it divides by 8.

These two queries are equivalent:

The built-in AVG() function handles this automatically. It sums all non-NULL values and divides by the count of non-NULL rows.

	SELECT
	AVG(score) AS avg_score
	FROM tests

The manual calculation using SUM/COUNT handles NULLs the same way as AVG.

Database Execution

AVG has important behaviors around NULL handling that can affect your calculations.

How AVG Executes

AVG = SUM / COUNT (of non-NULL values)
NULL values are excluded from both the sum and count
Returns NULL if all values are NULL
Result is typically a decimal, even if inputs are integers

Since AVG is sensitive to outliers, be cautious when interpreting results from skewed datasets.

> Complete this query to calculate the total and average cloud spend per region.

SELECT
  region,
  ___(amount) AS total_rev,
  ___(amount) AS avg_rev
FROM cloud_costs
GROUP BY region

SUM

AVG

COUNT

MAX

SUM and AVG both ignore NULL values. If a column has NULLs, they are silently excluded from the calculation.

AVG can produce misleading results on skewed data. Consider using percentiles alongside averages for a fuller picture.

Combining SUM and AVG in the same GROUP BY query gives you both the total and typical value per group in one pass.

MIN and MAX for extremes

Daily Life

Interviews

Find the highest and lowest values

MIN() and MAX() find the smallest and largest values in a column. They work with numbers, dates, and even text (alphabetical order).

Basic MIN/MAX Usage

MIN and MAX find the smallest and largest values in your data. They scan through groups to identify extremes.

Finding Extremes

MIN() scans through all values in each group, keeping track of the smallest one found so far:

	SELECT
	category,
	MIN(price) AS min_price
	FROM products
	GROUP BY category

SELECT
  category,
  MIN(price) AS min_price
FROM products
GROUP BY category

products

product_id

category

price

P001

Electronics

299

P002

Clothing

P003

Electronics

149

P004

Clothing

P005

Electronics

599

P006

Clothing

→

result

category

min_price

empty

GROUP BY category: every row lands in exactly one bucket, and MIN(price) updates as its group receives rows.

1/8

Within each category, the smallest price is selected. Electronics keeps 149 (ignoring 299 and 599), while Clothing keeps 29.

MAX() works the same way, but keeps track of the largest value in each group:

	SELECT
	category,
	MAX(price) AS max_price
	FROM products
	GROUP BY category

SELECT
  category,
  MAX(price) AS max_price
FROM products
GROUP BY category

products

product_id

category

price

P001

Electronics

299

P002

Clothing

P003

Electronics

149

P004

Clothing

P005

Electronics

599

P006

Clothing

→

result

category

max_price

empty

GROUP BY category: every row lands in exactly one bucket, and MAX(price) updates as its group receives rows.

1/8

Now the largest price per category is selected instead. Electronics surfaces 599 and Clothing surfaces 89, the opposite extremes from MIN.

Together with GROUP BY, they let you find price ranges within each category:

	SELECT
	category,
	MIN(price) AS cheapest,
	MAX(price) AS most_expensive,
	MAX(price) - MIN(price) AS price_range
	FROM products
	GROUP BY category

Result

category	cheapest	most_expensive	price_range
Electronics	149.00	599.00	450.00
Clothing	29.00	89.00	60.00

This tells you the price range within each category. Electronics spans $450 while Clothing only spans $60.

Advanced Applications

MIN and MAX work beyond numbers. They handle dates for time-based analysis and even text for alphabetical ordering.

MIN/MAX with Dates

MIN and MAX are extremely useful for date analysis: finding first and last events, date ranges, and activity spans.

	SELECT
	customer_id,
	MIN(order_date) AS first_order,
	MAX(order_date) AS last_order,
	COUNT(*) AS total_orders
	FROM orders
	GROUP BY customer_id

Result

customer_id	first_order	last_order	total_orders
C001	2024-01-15	2025-11-22	47
C002	2024-03-08	2025-10-30	23
C003	2024-06-21	2025-11-18	31

Now you can see each customer's entire order history at a glance: when they first ordered, their most recent order, and how many orders they have placed.

MIN/MAX with Text

When applied to text columns, MIN returns the alphabetically first value and MAX returns the alphabetically last value.

	SELECT
	MIN(product_name) AS first_alphabetically,
	MAX(product_name) AS last_alphabetically
	FROM products

Result

first_alphabetically	last_alphabetically
Adapter Cable	Wireless Router

Compare how MIN and MAX behave on the same data:

	SELECT
	category,
	MIN(price) AS cheapest
	FROM products
	GROUP BY category

Combining Aggregates

MIN and MAX are often used alongside other aggregate functions to give a complete picture:

	SELECT
	category,
	COUNT(*) AS product_count,
	MIN(price) AS min_price,
	AVG(price) AS avg_price,
	MAX(price) AS max_price
	FROM products
	GROUP BY category

Result

category	product_count	min_price	avg_price	max_price
Electronics	156	9.99	249.50	2499.99
Clothing	243	12.99	65.00	299.99
Home	89	4.99	89.75	599.99

This comprehensive summary shows price distribution across categories: how many products, the cheapest, average, and most expensive in each.

Database Execution

MIN and MAX have straightforward execution but work differently depending on data types.

How MIN/MAX Execute

MIN/MAX scan all values in the group to find extremes
NULL values are silently ignored
Work with numbers, dates, and text columns
Text comparison is alphabetical and case-sensitive

> Complete this query to find the cheapest and most expensive product in each category.

SELECT
  category,
  ___(price) AS lowest,
  ___(price) AS highest
FROM products
GROUP BY category

MIN

MAX

FIRST

LAST

MIN and MAX scan all values in the group to find extremes. NULL values are silently excluded from the comparison.

For text columns, MIN returns the alphabetically first value and MAX returns the last. This comparison is case-sensitive.

Combining MIN, MAX, and AVG in one query shows the full range and central tendency, giving a quick data distribution summary.

HAVING for filtered groups

Daily Life

Interviews

Filter groups after aggregation

HAVING filters groups after aggregation, just as WHERE filters rows before aggregation. You need HAVING when your filter condition involves an aggregate function.

The key distinction: WHERE cannot reference aggregate functions because it runs before the aggregation happens. HAVING runs after aggregation, so it CAN reference aggregated values.

Understanding HAVING

HAVING exists because WHERE cannot filter on aggregate results. Understanding when to use each is essential for correct queries.

Why We Need HAVING

Consider this question: "Which regions have more than 100 orders?"

You cannot use aggregate functions in WHERE because it runs before aggregation. HAVING runs after aggregation, so it can reference aggregated values:

This query will fail with an error because you cannot use an aggregate function in WHERE.

	SELECT
	region,
	COUNT(*) AS order_count
	FROM orders
	WHERE COUNT(*) > 100
	GROUP BY region

WHERE filters individual rows (before grouping). HAVING filters entire groups (after grouping).

	SELECT
	region,
	COUNT(*) AS order_count
	FROM orders
	GROUP BY region
	HAVING COUNT(*) > 3

SELECT
  region,
  COUNT(*) AS order_count
FROM orders
GROUP BY region
HAVING COUNT(*) > 3

GROUP BY

orders

order_id

region

amount

North

150

South

200

North

175

South

225

North

190

East

160

North

180

South

210

→

result

region

order_count

empty

First, GROUP BY region counts every group.

1/18

Groups are formed first, then the count threshold is applied. North (4 orders) and South (3 orders) pass the filter, while East (1 order) is eliminated entirely.

HAVING Examples

HAVING conditions can use any aggregate function, letting you filter on totals, averages, counts, or any calculated metric.

	SELECT
	customer_id,
	COUNT(*) AS order_count,
	SUM(amount) AS total_spent
	FROM orders
	GROUP BY customer_id
	HAVING SUM(amount) > 1000

Result

customer_id	order_count	total_spent
C001	23	2847.50
C015	18	1923.00
C042	31	4521.75
C089	12	1156.25

This shows only customers who have spent more than $1,000 total. Customers who spent less are filtered out by HAVING.

	SELECT
	product_id,
	AVG(rating) AS avg_rating,
	COUNT(*) AS review_count
	FROM reviews
	GROUP BY product_id
	HAVING COUNT(*) >= 10
	AND AVG(rating) >= 4

Result

product_id	avg_rating	review_count
P101	4.7	156
P205	4.2	89
P312	4.5	234

This finds highly-rated products with enough reviews to be statistically meaningful. Both conditions must be true: at least 10 reviews AND average rating of 4.0 or higher.

Practical Patterns

WHERE and HAVING can work together in the same query. Understanding the execution order helps you write efficient and correct queries.

WHERE vs HAVING

You can use both in the same query. WHERE filters rows first, then GROUP BY creates groups, then HAVING filters groups:

	SELECT
	region,
	COUNT(*) AS completed_orders
	FROM orders
	WHERE status = 'completed'
	GROUP BY region
	HAVING COUNT(*) > 2

SELECT
  region,
  COUNT(*) AS completed_orders
FROM orders
WHERE status = 'completed'
GROUP BY region
HAVING COUNT(*) > 2

WHERE

orders

order_id

region

status

North

completed

South

pending

North

completed

South

completed

North

completed

East

completed

South

completed

North

pending

East

completed

South

completed

North

completed

West

completed

→

result

region

completed_orders

empty

WHERE runs first, on individual rows: only status = 'completed' survives to be grouped.

1/35

	SELECT
	region,
	COUNT(*) AS completed_orders
	FROM orders
	WHERE status = 'completed'
	GROUP BY region
	HAVING COUNT(*) > 50

Result

region	completed_orders
North	234
South	189
West	67

First, WHERE keeps only completed orders (filtering rows). Then, GROUP BY counts completed orders per region. Finally, HAVING keeps only regions with more than 50 completed orders (filtering groups).

These guidelines help you choose between WHERE and HAVING for optimal query performance.

✓Do

Use WHERE for filtering individual rows
Use HAVING for filtering based on aggregate values
Filter with WHERE first to reduce data before aggregation (better performance)
Remember: HAVING comes after GROUP BY

✗Don't

Don't try to use aggregate functions in WHERE
Don't use HAVING when WHERE would work because it's slower
Don't forget that HAVING filters entire groups, not rows

Execution Order

SQL clauses execute in a specific order. Understanding this sequence clarifies why HAVING can reference aggregates but WHERE cannot.

1. FROM

Start with the table

2. WHERE

Filter individual rows

3. GROUP BY

Create groups from remaining rows

4. HAVING

Filter groups based on aggregate values

5. SELECT

Calculate and return results

> Fill in the missing parts to find only departments where the average metric value exceeds 60,000.

SELECT
  department,
  AVG(metric_value) AS avg_metric
FROM employee_metrics
GROUP BY department
___ ___(metric_value) > ___

HAVING

AVG

60000

WHERE

50000

HAVING is the clause that unlocks group-level filtering. Without it, you would need subqueries to accomplish what HAVING does in a single, readable step.

The WHERE and HAVING combination is extremely powerful: WHERE eliminates unwanted rows before any grouping occurs, reducing the data the database must aggregate. HAVING then eliminates groups whose summary metrics do not meet your criteria.

If your HAVING condition does not reference an aggregate function, move it to WHERE instead. Filtering rows before grouping is always faster than filtering groups after aggregation.

❯❯❯PUTTING IT ALL TOGETHER

> You are a product analyst at Stripe building a feature usage dashboard that summarizes transaction activity across subscription tiers. The product team needs counts, totals, and averages per plan so they can identify which tiers drive the most volume.

GROUP BY groups every transaction row by subscription tier, producing one summary row per tier.

COUNT(*) tallies total transactions per tier while COUNT(DISTINCT ...) measures unique active users.

SUM computes total revenue per tier and AVG reveals the typical transaction size.

HAVING filters the grouped results to only tiers exceeding a minimum transaction threshold for the report.

KEY TAKEAWAYS

GROUP BY buckets rows by column values; each group produces one output row

COUNT(*) counts all rows; COUNT(column) counts non-NULL values only

SUM and AVG work only on numeric columns; NULL values are ignored

MIN and MAX work on any comparable type: numbers, strings, dates

Every non-aggregated SELECT column must appear in GROUP BY

HAVING filters groups after aggregation; WHERE filters rows before

Use WHERE to exclude rows from calculation; use HAVING to exclude groups from results

Aggregation without GROUP BY treats the entire table as one group

A million rows walk into a SUM...

Category: SQL
Difficulty: beginner
Duration: 38 minutes
Challenges: 0 hands-on challenges

Topics covered: GROUP BY for categorizing data, COUNT variations (*,col,DISTINCT), SUM and AVG calculations, MIN and MAX for extremes, HAVING for filtered groups

Lesson Sections

GROUP BY for categorizing data (concepts: sqlGroupBy)
Understanding GROUP BY How GROUP BY Works Consider this sales data: Watch how rows with the same region value are gathered into groups, then collapsed into a single summary row per group. The result has one row per region. Each row summarizes all the sales in that region. The original 6 rows have been collapsed into 3 summary rows. Rules and Patterns GROUP BY Golden Rule This rule exists because SQL needs to know how to handle columns that have multiple values per group. If you group by region,
COUNT variations (*,col,DISTINCT) (concepts: sqlCount)
COUNT Variations COUNT(*) - Count All Rows COUNT(column) Consider a table where some customers have not provided an email: COUNT(DISTINCT) 1,847 orders came from 892 unique customers. This tells you that some customers placed multiple orders. Practical Applications Comparing COUNT Variants Database Execution
SUM and AVG calculations (concepts: sqlSumAvg)
SUM Functions Basic Example SUM with GROUP BY Each group's amounts are added together into a single total. North's three sales (150 + 175 + 190) become 515, while East's lone sale stays at 160. Now you can instantly see which region generates the most revenue and sells the most units. SUM with Expressions You can sum calculated values, not just raw columns. This is common for computing totals from unit prices and quantities. The expression price * quantity is calculated for each row, then those
MIN and MAX for extremes (concepts: sqlMinMax)
Basic MIN/MAX Usage Finding Extremes Within each category, the smallest price is selected. Electronics keeps 149 (ignoring 299 and 599), while Clothing keeps 29. This tells you the price range within each category. Electronics spans $450 while Clothing only spans $60. Advanced Applications MIN/MAX with Dates Now you can see each customer's entire order history at a glance: when they first ordered, their most recent order, and how many orders they have placed. MIN/MAX with Text Combining Aggregat
HAVING for filtered groups (concepts: sqlHaving)
Understanding HAVING Why We Need HAVING Consider this question: "Which regions have more than 100 orders?" Groups are formed first, then the count threshold is applied. North (4 orders) and South (3 orders) pass the filter, while East (1 order) is eliminated entirely. HAVING Examples This finds highly-rated products with enough reviews to be statistically meaningful. Both conditions must be true: at least 10 reviews AND average rating of 4.0 or higher. Practical Patterns WHERE vs HAVING Executio