Aggregating: Beginner

Amazon lists over 350 million products, and every seller dashboard metric you see, from units sold to average star rating to revenue by category, is produced by aggregate SQL queries running across billions of order records. When Amazon's retail team identifies that a product category is underperforming, they are looking at COUNT, SUM, and AVG applied across millions of rows grouped by category, marketplace, and time window. The same functions that power those eight-figure business decisions are the ones you are about to learn: COUNT, SUM, AVG, MIN, MAX, and GROUP BY.

GROUP BY for categorizing data

Daily Life
Interviews

Group rows into meaningful categories

GROUP BY collapses individual rows into summary groups. Without it, you can count every order in your database but not how many belong to each customer. That gap is what GROUP BY closes.

THE CASE OF THE COLLAPSED RECORDS

Lisa Park flagged a summary report. Twelve orders in the system. The report shows four rows. Nobody deleted anything. But eight orders have vanished from the page. She has a board meeting in two hours.

One table. Twelve orders. Four categories. The report gives back four rows.

orders
order_idcategorycustomeramount
ORD-101ElectronicsChen$1,200
ORD-102ClothingPatel$650
ORD-103BooksKim$150
ORD-104ElectronicsRuiz$2,500
ORD-105HomeOsei$1,800
ORD-106ClothingTanaka$800
ORD-107BooksChen$200
ORD-108ElectronicsKim$3,200
ORD-109HomePatel$1,300
ORD-110ClothingOsei$500
ORD-111BooksRuiz$100
ORD-112ElectronicsTanaka$1,300

GROUP BY is the organizing principle behind aggregation. It tells SQL: "Put all rows with the same value into the same bucket, then calculate something for each bucket."

Without GROUP BY, an aggregate function operates on the entire table and returns a single row. With GROUP BY, you get one row per unique value in the grouping column.

Understanding GROUP BY

GROUP BY transforms your data by organizing rows into buckets based on shared values, then summarizing each bucket independently.

//

How GROUP BY Works

Consider this sales data:
sale_idregionamount
1North150
2South200
3North175
4South225
5North190
6East160
Watch how rows with the same region value are gathered into groups, then collapsed into a single summary row per group.
1SELECT
2 region,
3 COUNT(*) AS sale_count
4FROM transactions
5GROUP BY region
Group By
transactions
sale_idregionamount
1North150
2South200
3North175
4South225
5North190
6East160
result
regionsale_count
Step 1/6

When you GROUP BY region, SQL creates three buckets: one for "North" (3 rows), one for "South" (2 rows), and one for "East" (1 row). Each bucket is then summarized independently.

1SELECT
2 region,
3 COUNT(*) AS sale_count
4FROM transactions
5GROUP BY region
Result
regionsale_count
North3
South2
East1
The result has one row per region. Each row summarizes all the sales in that region. The original 6 rows have been collapsed into 3 summary rows.

Rules and Patterns

GROUP BY has strict rules about what you can select. Understanding these rules prevents common errors and helps you write correct queries.

//

GROUP BY Golden Rule

When you use GROUP BY, every column in your SELECT must either:

1. Be in the GROUP BY clause, OR 2. Be inside an aggregate function (COUNT, SUM, AVG, etc.)

This rule exists because SQL needs to know how to handle columns that have multiple values per group. If you group by region, what should SQL show for sale_id? There are multiple sale_ids in each region.
SQL cannot pick one arbitrarily, so you must tell it what to do: count them, sum them, take the max, etc.
This query will fail because sale_id is not aggregated and not in GROUP BY.
1SELECT
2 region,
3 sale_id,
4 COUNT(*)
5FROM transactions
6GROUP BY region
TIP
If your query fails with an error about columns not being in GROUP BY, check that every non-aggregated column appears in your GROUP BY clause.
//

Multiple Column Grouping

You can group by multiple columns to create more specific buckets. Each unique combination of values becomes its own group.
1SELECT
2 region,
3 product_type,
4 COUNT(*) AS product_count
5FROM transactions
6GROUP BY region, product_type
Result
regionproduct_typeproduct_count
NorthElectronics3
NorthClothing5
SouthElectronics4
SouthClothing2
EastElectronics3
Now you see sales broken down by both region AND product type. "North Electronics" is a separate bucket from "North Clothing".
//

Database Execution

Understanding the execution sequence helps you write efficient GROUP BY queries.

How GROUP BY Executes
  • GROUP BY executes after WHERE but before SELECT
  • Rows are sorted into buckets by grouping column values
  • Aggregate functions calculate one result per bucket
  • The result has one row per unique group

> Complete this query to find how many products exist in each category.

SELECT
  ___,
  ___(*) AS num_products
FROM products
___ category
COUNT
SUM
category
ORDER BY
GROUP BY

GROUP BY is the single most important clause in analytical SQL. Every dashboard, report, and summary table ultimately relies on it to collapse raw rows into meaningful categories.

The golden rule is simple: every column in SELECT must either be in GROUP BY or wrapped in an aggregate function. Violating this rule produces an error, and rightly so, because SQL needs to know how to handle multi-valued columns.

TIP
When a GROUP BY query returns more rows than expected, you are probably grouping on a higher-cardinality column than intended. Check whether you need to combine columns or use a coarser grouping.

COUNT variations (*,col,DISTINCT)

Daily Life
Interviews

Count rows, values, or unique entries

COUNT() is the simplest and most frequently used aggregate function. It counts things. Depending on how you use it, you can count rows, count non-empty values, or count unique values.

COUNT Variations

COUNT has three forms, each answering a different question about your data. Choosing the right one depends on what you need to measure.

//

COUNT(*) - Count All Rows

COUNT(*) counts every row in the group, regardless of what values are in those rows. It counts rows where columns are empty and rows where columns have values.

1SELECT
2 COUNT(*) AS total_orders
3FROM orders
Result
total_orders
1847

With GROUP BY, COUNT(*) tells you how many rows are in each group:

1SELECT
2 status,
3 COUNT(*) AS order_count
4FROM orders
5GROUP BY status
Result
statusorder_count
completed1203
pending412
cancelled232
//

COUNT(column)

COUNT(column_name) counts only rows where that specific column is not empty (not NULL). This is useful when some rows have missing data.

Consider a table where some customers have not provided an email:
customer_idnameemail
1Alicealice@email.com
2BobNULL
3Carolcarol@email.com
4DaveNULL
5Eveeve@email.com
1SELECT
2 COUNT(*) AS total_customers,
3 COUNT(email) AS customers_with_email
4FROM customers
Result
total_customerscustomers_with_email
53

COUNT(*) returns 5 (all rows). COUNT(email) returns 3 (only rows with an email). The NULL entries represent missing values which COUNT(column) skips.

//

COUNT(DISTINCT)

Adding DISTINCT inside COUNT tells SQL to count only unique values, ignoring duplicates.

1SELECT
2 COUNT(*) AS total_orders,
3 COUNT(DISTINCT customer_id) AS unique_customers
4FROM orders
Result
total_ordersunique_customers
1847892
1,847 orders came from 892 unique customers. This tells you that some customers placed multiple orders.

Practical Applications

Knowing when to use each COUNT variant helps you answer different business questions accurately.

//

Comparing COUNT Variants

The three variants of COUNT answer different questions. Choose the right one for what you need to measure:

COUNT(*) counts every row in each group, including rows with NULL values. Use this when you want the total number of records.
1SELECT
2 region,
3 COUNT(*) AS total_rows
4FROM orders
5GROUP BY region
//

Database Execution

Each COUNT variant has different performance characteristics based on the work required.

COUNT(*)
Extremely fast: only counts rows, no value checks needed
COUNT(column)
Checks each value for NULL before counting
COUNT(DISTINCT)
Builds a set of unique values, then counts the set
Return type
COUNT always returns an integer, never NULL

> Complete this query to count orders and find unique customers per region.

SELECT
  region,
  COUNT(*) AS total_orders,
  ___(___ customer_id) AS unique_customers
FROM orders
GROUP BY region
SUM
AVG
DISTINCT
COUNT

COUNT(*), COUNT(column), and COUNT(DISTINCT column) each answer a different question. Choosing the right variant prevents subtle errors in data quality reports and business metrics.

A common real-world use case is audit reporting: COUNT(*) gives total records, COUNT(email) gives records with a value, and COUNT(DISTINCT customer_id) gives unique entities, all in one query.

TIP
When a COUNT result surprises you, ask whether NULLs or duplicates might explain it. COUNT(*) vs COUNT(column) vs COUNT(DISTINCT) each handle those cases differently.

SUM and AVG calculations

Daily Life
Interviews

Calculate totals and averages per group

SUM() adds up all the values in a column. It is used for any question involving totals: total revenue, total quantity, total hours, total anything numeric.

SUM Functions

SUM adds up numeric values to produce totals. It works with or without GROUP BY, calculating grand totals or group-level totals.

//

Basic Example

The simplest SUM query adds up every value in a column across the entire table.

1SELECT
2 SUM(amount) AS total_revenue
3FROM transactions
Result
total_revenue
2847593.50

Without GROUP BY, SUM adds up every row in the table, giving you a grand total.

//

SUM with GROUP BY

The real power of SUM appears when combined with GROUP BY. Now you can see totals broken down by category.

1SELECT
2 region,
3 SUM(amount) AS total_revenue
4FROM transactions
5GROUP BY region
Sum
transactions
sale_idregionamount
1North150
2South200
3North175
4South225
5North190
6East160
result
regiontotal_amount
Step 1/6
Each group's amounts are added together into a single total. North's three sales (150 + 175 + 190) become 515, while East's lone sale stays at 160.
1SELECT
2 region,
3 SUM(amount) AS total_revenue,
4 SUM(quantity) AS units_sold
5FROM transactions
6GROUP BY region
Result
regiontotal_revenueunits_sold
North892450.004521
South1123000.505892
East532143.002876
West300000.001543
Now you can instantly see which region generates the most revenue and sells the most units.
//

SUM with Expressions

You can sum calculated values, not just raw columns. This is common for computing totals from unit prices and quantities.
1SELECT
2 category,
3 SUM(price * quantity) AS total_value,
4 SUM(price * quantity * 0.1) AS estimated_tax
5FROM order_items
6GROUP BY category
Result
categorytotal_valueestimated_tax
Electronics245000.0024500.00
Clothing89000.008900.00
Home156000.0015600.00
The expression price * quantity is calculated for each row, then those products are summed together.
//

Grand vs Group Totals

SUM behaves differently depending on whether you include GROUP BY. Compare the two approaches:

Without GROUP BY, SUM adds every row in the table and returns a single grand total. The entire table is treated as one group.
1SELECT
2 SUM(amount) AS total_revenue
3FROM transactions
//

Handling Empty Data

SUM ignores NULL values. If a column has some NULL entries, they are simply skipped in the calculation. If ALL values are NULL, SUM returns NULL (not zero).

//

Database Execution

SUM has specific behaviors around NULL values and data types that affect your results.

SUM behavior
  • Iterates through each row in the group
  • NULL values are skipped (not treated as zero)
  • Only works on numeric columns
Watch out for
  • Returns NULL if all values are NULL
  • Integer overflow on very large sums
  • Text columns cause errors

AVG Functions

AVG calculates arithmetic means to find typical values. Understanding how it handles NULLs is crucial for accurate analysis.

//

AVG()

AVG() calculates the arithmetic mean: sum of all values divided by the count of values. It is essential for understanding typical values: average order size, average response time, average salary.

//

Basic Example

AVG without GROUP BY calculates the mean across all rows in the table.

1SELECT
2 AVG(amount) AS average_order_value
3FROM orders
Result
average_order_value
127.43
The average order value across all orders is $127.43.
//

AVG with GROUP BY

When combined with GROUP BY, AVG calculates a separate mean for each group.

1SELECT
2 department,
3 AVG(salary) AS avg_salary
4FROM employee_metrics
5GROUP BY department
Avg
employee_metrics
emp_iddepartmentsalary
1Engineering120000
2Sales80000
3Engineering130000
4Sales90000
5Engineering125000
6Marketing95000
result
departmentavg_salary
Step 1/6
Each group's salaries are averaged independently. Engineering's three salaries produce a higher mean than Sales or Marketing, revealing the pay disparity across departments.
1SELECT
2 department,
3 AVG(salary) AS avg_salary,
4 COUNT(*) AS employee_count
5FROM employee_metrics
6GROUP BY department
Result
departmentavg_salaryemployee_count
Engineering125000.0045
Sales85000.0032
Marketing92000.0018
Support65000.0028
This reveals that Engineering has the highest average salary and the most employees.
//

AVG vs Manual Calculation

AVG() is equivalent to SUM() / COUNT(), but there is a subtle difference with NULL values:

TIP
AVG() divides by the count of non-NULL values only. If you have 10 rows but 2 are NULL, it divides by 8.
These two queries are equivalent:
The built-in AVG() function handles this automatically. It sums all non-NULL values and divides by the count of non-NULL rows.
1SELECT
2 AVG(score) AS avg_score
3FROM tests

The manual calculation using SUM/COUNT handles NULLs the same way as AVG.

//

Database Execution

AVG has important behaviors around NULL handling that can affect your calculations.

How AVG Executes
  • AVG = SUM / COUNT (of non-NULL values)
  • NULL values are excluded from both the sum and count
  • Returns NULL if all values are NULL
  • Result is typically a decimal, even if inputs are integers

Since AVG is sensitive to outliers, be cautious when interpreting results from skewed datasets.

> Complete this query to calculate the total and average cloud spend per region.

SELECT
  region,
  ___(amount) AS total_rev,
  ___(amount) AS avg_rev
FROM cloud_costs
GROUP BY region
AVG
SUM
MAX
COUNT

SUM and AVG both ignore NULL values. If a column has NULLs, they are silently excluded from the calculation.

AVG can produce misleading results on skewed data. Consider using percentiles alongside averages for a fuller picture.

Combining SUM and AVG in the same GROUP BY query gives you both the total and typical value per group in one pass.

MIN and MAX for extremes

Daily Life
Interviews

Find the highest and lowest values

MIN() and MAX() find the smallest and largest values in a column. They work with numbers, dates, and even text (alphabetical order).

Basic MIN/MAX Usage

MIN and MAX find the smallest and largest values in your data. They scan through groups to identify extremes.

//

Finding Extremes

MIN() scans through all values in each group, keeping track of the smallest one found so far:

1SELECT
2 category,
3 MIN(price) AS min_price
4FROM products
5GROUP BY category
Min
products
product_idcategoryprice
P001Electronics299
P002Clothing45
P003Electronics149
P004Clothing89
P005Electronics599
P006Clothing29
result
categorymin_price
Step 1/6
Within each category, the smallest price is selected. Electronics keeps 149 (ignoring 299 and 599), while Clothing keeps 29.

MAX() works the same way, but keeps track of the largest value in each group:

1SELECT
2 category,
3 MAX(price) AS max_price
4FROM products
5GROUP BY category
Max
products
product_idcategoryprice
P001Electronics299
P002Clothing45
P003Electronics149
P004Clothing89
P005Electronics599
P006Clothing29
result
categorymax_price
Step 1/6

Now the largest price per category is selected instead. Electronics surfaces 599 and Clothing surfaces 89, the opposite extremes from MIN.

Together with GROUP BY, they let you find price ranges within each category:

1SELECT
2 category,
3 MIN(price) AS cheapest,
4 MAX(price) AS most_expensive,
5 MAX(price) - MIN(price) AS price_range
6FROM products
7GROUP BY category
Result
categorycheapestmost_expensiveprice_range
Electronics149.00599.00450.00
Clothing29.0089.0060.00
This tells you the price range within each category. Electronics spans $450 while Clothing only spans $60.

Advanced Applications

MIN and MAX work beyond numbers. They handle dates for time-based analysis and even text for alphabetical ordering.

//

MIN/MAX with Dates

MIN and MAX are extremely useful for date analysis: finding first and last events, date ranges, and activity spans.

1SELECT
2 customer_id,
3 MIN(order_date) AS first_order,
4 MAX(order_date) AS last_order,
5 COUNT(*) AS total_orders
6FROM orders
7GROUP BY customer_id
Result
customer_idfirst_orderlast_ordertotal_orders
C0012024-01-152025-11-2247
C0022024-03-082025-10-3023
C0032024-06-212025-11-1831
Now you can see each customer's entire order history at a glance: when they first ordered, their most recent order, and how many orders they have placed.
//

MIN/MAX with Text

When applied to text columns, MIN returns the alphabetically first value and MAX returns the alphabetically last value.

1SELECT
2 MIN(product_name) AS first_alphabetically,
3 MAX(product_name) AS last_alphabetically
4FROM products
Result
first_alphabeticallylast_alphabetically
Adapter CableWireless Router

Compare how MIN and MAX behave on the same data:

1SELECT
2 category,
3 MIN(price) AS cheapest
4FROM products
5GROUP BY category
//

Combining Aggregates

MIN and MAX are often used alongside other aggregate functions to give a complete picture:

1SELECT
2 category,
3 COUNT(*) AS product_count,
4 MIN(price) AS min_price,
5 AVG(price) AS avg_price,
6 MAX(price) AS max_price
7FROM products
8GROUP BY category
Result
categoryproduct_countmin_priceavg_pricemax_price
Electronics1569.99249.502499.99
Clothing24312.9965.00299.99
Home894.9989.75599.99
This comprehensive summary shows price distribution across categories: how many products, the cheapest, average, and most expensive in each.
//

Database Execution

MIN and MAX have straightforward execution but work differently depending on data types.

How MIN/MAX Execute
  • MIN/MAX scan all values in the group to find extremes
  • NULL values are silently ignored
  • Work with numbers, dates, and text columns
  • Text comparison is alphabetical and case-sensitive

> Complete this query to find the cheapest and most expensive product in each category.

SELECT
  category,
  ___(price) AS lowest,
  ___(price) AS highest
FROM products
GROUP BY category
MAX
FIRST
MIN
LAST

MIN and MAX scan all values in the group to find extremes. NULL values are silently excluded from the comparison.

For text columns, MIN returns the alphabetically first value and MAX returns the last. This comparison is case-sensitive.

Combining MIN, MAX, and AVG in one query shows the full range and central tendency, giving a quick data distribution summary.

HAVING for filtered groups

Daily Life
Interviews

Filter groups after aggregation

HAVING filters groups after aggregation, just as WHERE filters rows before aggregation. You need HAVING when your filter condition involves an aggregate function.

The key distinction: WHERE cannot reference aggregate functions because it runs before the aggregation happens. HAVING runs after aggregation, so it CAN reference aggregated values.

Understanding HAVING

HAVING exists because WHERE cannot filter on aggregate results. Understanding when to use each is essential for correct queries.

//

Why We Need HAVING

Consider this question: "Which regions have more than 100 orders?"

You cannot use aggregate functions in WHERE because it runs before aggregation. HAVING runs after aggregation, so it can reference aggregated values:

This query will fail with an error because you cannot use an aggregate function in WHERE.
1SELECT
2 region,
3 COUNT(*) AS order_count
4FROM orders
5WHERE COUNT(*) > 100
6GROUP BY region

WHERE filters individual rows (before grouping). HAVING filters entire groups (after grouping).

1SELECT
2 region,
3 COUNT(*) AS order_count
4FROM orders
5GROUP BY region
6HAVING COUNT(*) > 3
Having
orders
order_idregionamount
1North150
2South200
3North175
4South225
5North190
6East160
7North180
8South210
result
regionorder_count
Step 1/8
Groups are formed first, then the count threshold is applied. North (4 orders) and South (3 orders) pass the filter, while East (1 order) is eliminated entirely.
//

HAVING Examples

HAVING conditions can use any aggregate function, letting you filter on totals, averages, counts, or any calculated metric.

1SELECT
2 customer_id,
3 COUNT(*) AS order_count,
4 SUM(amount) AS total_spent
5FROM orders
6GROUP BY customer_id
7HAVING SUM(amount) > 1000
Result
customer_idorder_counttotal_spent
C001232847.50
C015181923.00
C042314521.75
C089121156.25

This shows only customers who have spent more than $1,000 total. Customers who spent less are filtered out by HAVING.

1SELECT
2 product_id,
3 AVG(rating) AS avg_rating,
4 COUNT(*) AS review_count
5FROM reviews
6GROUP BY product_id
7HAVING COUNT(*) >= 10
8AND AVG(rating) >= 4
Result
product_idavg_ratingreview_count
P1014.7156
P2054.289
P3124.5234
This finds highly-rated products with enough reviews to be statistically meaningful. Both conditions must be true: at least 10 reviews AND average rating of 4.0 or higher.

Practical Patterns

WHERE and HAVING can work together in the same query. Understanding the execution order helps you write efficient and correct queries.

//

WHERE vs HAVING

You can use both in the same query. WHERE filters rows first, then GROUP BY creates groups, then HAVING filters groups:

1SELECT
2 region,
3 COUNT(*) AS completed_orders
4FROM orders
5WHERE status = 'completed'
6GROUP BY region
7HAVING COUNT(*) > 2
Where Vs Having
orders
order_idregionstatus
1Northcompleted
2Southpending
3Northcompleted
4Southcompleted
5Northcompleted
6Eastcompleted
7Southcompleted
8Northpending
9Eastcompleted
10Southcompleted
11Northcompleted
12Westcompleted
result
regioncompleted_orders
Step 1/12
1SELECT
2 region,
3 COUNT(*) AS completed_orders
4FROM orders
5WHERE status = 'completed'
6GROUP BY region
7HAVING COUNT(*) > 50
Result
regioncompleted_orders
North234
South189
West67

First, WHERE keeps only completed orders (filtering rows). Then, GROUP BY counts completed orders per region. Finally, HAVING keeps only regions with more than 50 completed orders (filtering groups).

These guidelines help you choose between WHERE and HAVING for optimal query performance.

Do
  • Use WHERE for filtering individual rows
  • Use HAVING for filtering based on aggregate values
  • Filter with WHERE first to reduce data before aggregation (better performance)
  • Remember: HAVING comes after GROUP BY
Don't
  • Don't try to use aggregate functions in WHERE
  • Don't use HAVING when WHERE would work because it's slower
  • Don't forget that HAVING filters entire groups, not rows
//

Execution Order

SQL clauses execute in a specific order. Understanding this sequence clarifies why HAVING can reference aggregates but WHERE cannot.

1. FROM
Start with the table
2. WHERE
Filter individual rows
3. GROUP BY
Create groups from remaining rows
4. HAVING
Filter groups based on aggregate values
5. SELECT
Calculate and return results

> Fill in the missing parts to find only departments where the average metric value exceeds 60,000.

SELECT
  department,
  AVG(metric_value) AS avg_metric
FROM employee_metrics
GROUP BY department
___ ___(metric_value) > ___
HAVING
AVG
60000
50000
WHERE

HAVING is the clause that unlocks group-level filtering. Without it, you would need subqueries to accomplish what HAVING does in a single, readable step.

The WHERE and HAVING combination is extremely powerful: WHERE eliminates unwanted rows before any grouping occurs, reducing the data the database must aggregate. HAVING then eliminates groups whose summary metrics do not meet your criteria.

If your HAVING condition does not reference an aggregate function, move it to WHERE instead. Filtering rows before grouping is always faster than filtering groups after aggregation.

PUTTING IT ALL TOGETHER

> You are a product analyst at Stripe building a feature usage dashboard that summarizes transaction activity across subscription tiers. The product team needs counts, totals, and averages per plan so they can identify which tiers drive the most volume.

GROUP BY groups every transaction row by subscription tier, producing one summary row per tier.
COUNT(*) tallies total transactions per tier while COUNT(DISTINCT ...) measures unique active users.
SUM computes total revenue per tier and AVG reveals the typical transaction size.
HAVING filters the grouped results to only tiers exceeding a minimum transaction threshold for the report.
KEY TAKEAWAYS
GROUP BY buckets rows by column values; each group produces one output row
COUNT(*) counts all rows; COUNT(column) counts non-NULL values only
SUM and AVG work only on numeric columns; NULL values are ignored
MIN and MAX work on any comparable type: numbers, strings, dates
Every non-aggregated SELECT column must appear in GROUP BY
HAVING filters groups after aggregation; WHERE filters rows before
Use WHERE to exclude rows from calculation; use HAVING to exclude groups from results
Aggregation without GROUP BY treats the entire table as one group

Aggregating: Beginner

A million rows walk into a SUM...

Category
SQL
Difficulty
beginner
Duration
38 minutes
Challenges
0 hands-on challenges

Topics covered: GROUP BY for categorizing data, COUNT variations (*,col,DISTINCT), SUM and AVG calculations, MIN and MAX for extremes, HAVING for filtered groups

Lesson Sections

  1. GROUP BY for categorizing data (concepts: sqlGroupBy)

    Understanding GROUP BY How GROUP BY Works Consider this sales data: Watch how rows with the same region value are gathered into groups, then collapsed into a single summary row per group. The result has one row per region. Each row summarizes all the sales in that region. The original 6 rows have been collapsed into 3 summary rows. Rules and Patterns GROUP BY Golden Rule This rule exists because SQL needs to know how to handle columns that have multiple values per group. If you group by region,

  2. COUNT variations (*,col,DISTINCT) (concepts: sqlCount, sqlCountDistinct)

    COUNT Variations COUNT(*) - Count All Rows COUNT(column) Consider a table where some customers have not provided an email: COUNT(DISTINCT) 1,847 orders came from 892 unique customers. This tells you that some customers placed multiple orders. Practical Applications Comparing COUNT Variants Database Execution

  3. SUM and AVG calculations (concepts: sqlSumAvg)

    SUM Functions Basic Example SUM with GROUP BY Each group's amounts are added together into a single total. North's three sales (150 + 175 + 190) become 515, while East's lone sale stays at 160. Now you can instantly see which region generates the most revenue and sells the most units. SUM with Expressions You can sum calculated values, not just raw columns. This is common for computing totals from unit prices and quantities. The expression price * quantity is calculated for each row, then those

  4. MIN and MAX for extremes (concepts: sqlMinMax)

    Basic MIN/MAX Usage Finding Extremes Within each category, the smallest price is selected. Electronics keeps 149 (ignoring 299 and 599), while Clothing keeps 29. This tells you the price range within each category. Electronics spans $450 while Clothing only spans $60. Advanced Applications MIN/MAX with Dates Now you can see each customer's entire order history at a glance: when they first ordered, their most recent order, and how many orders they have placed. MIN/MAX with Text Combining Aggregat

  5. HAVING for filtered groups (concepts: sqlHaving)

    Understanding HAVING Why We Need HAVING Consider this question: "Which regions have more than 100 orders?" Groups are formed first, then the count threshold is applied. North (4 orders) and South (3 orders) pass the filter, while East (1 order) is eliminated entirely. HAVING Examples This finds highly-rated products with enough reviews to be statistically meaningful. Both conditions must be true: at least 10 reviews AND average rating of 4.0 or higher. Practical Patterns WHERE vs HAVING Executio