SQL GROUP BY and HAVING Practice

Aggregation is the single most tested SQL category, appearing in roughly one out of every four SQL interview questions. Five problems covering GROUP BY, HAVING, CASE WHEN inside aggregates, and multi-column grouping at interview difficulty.

#1
Most-tested SQL category
25%
Of SQL interview questions
5
Practice problems in this guide

What you need to know

GROUP BY is the foundation of every analytical query. Aggregation outnumbers JOINs, window functions, and every other SQL category in interview data. Four areas cover what interviewers test.

GROUP BY basics

GROUP BY collapses rows that share the same value(s) into a single output row. Every column in your SELECT must either appear in GROUP BY or be wrapped in an aggregate function (COUNT, SUM, AVG, MIN, MAX). This rule is enforced by every SQL engine and is the top source of GROUP BY errors. Example: SELECT department, COUNT(*) AS headcount FROM employees GROUP BY department.

HAVING filters

HAVING filters groups after aggregation. WHERE filters rows before aggregation. If you need departments with more than 10 employees, you cannot put COUNT(*) > 10 in WHERE because the count has not been computed yet. HAVING exists for exactly this purpose. Example: SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 80000.

Conditional aggregation with CASE WHEN

CASE WHEN inside an aggregate lets you count or sum only rows meeting a condition. Replaces multiple self-joins or subqueries. One of the most useful patterns and appears in nearly every analytical interview. Classic form: SUM(CASE WHEN condition THEN 1 ELSE 0 END). Example: SELECT department, SUM(CASE WHEN status = 'active' THEN 1 ELSE 0 END) AS active, SUM(CASE WHEN status = 'churned' THEN 1 ELSE 0 END) AS churned FROM users GROUP BY department.

Multi-column grouping

GROUP BY multiple columns creates a group for each unique combination. GROUP BY year, quarter gives one row per quarter per year. The order does not affect the result but affects readability. Put the broader grouping first (year before quarter, country before city). Example: SELECT year, quarter, region, SUM(revenue) FROM sales GROUP BY year, quarter, region.

Five GROUP BY practice problems

Each problem adds a layer of difficulty. By problem 5, you are combining conditional aggregation with year-over-year calculations.

Warm-up

1. Revenue per category

orders (product_category, amount). Compute total revenue and order count per category, sorted by revenue descending. The baseline GROUP BY problem. If you can't write this in under a minute, spend more time on fundamentals. Skills: GROUP BY, SUM, COUNT, ORDER BY.

Intermediate

2. Departments above threshold

Find departments where average salary exceeds $75,000 AND headcount is at least 5. Return department, average salary, headcount. Tests the WHERE vs HAVING distinction. Both conditions go in HAVING because they reference aggregates. Skills: HAVING with multiple conditions, AVG, COUNT.

Intermediate

3. Monthly active vs churned users

user_events (user_id, event_date, event_type). For each month: total unique users, users who had a purchase event, users who had zero purchase events. Conditional aggregation, one query, no subqueries, no self-joins. Skills: COUNT(DISTINCT), CASE WHEN inside aggregates, date truncation.

Advanced

4. Conversion funnel by source

funnel_events (user_id, event_stage in {visit, signup, trial, paid}, source). Build a funnel showing count at each stage per source. Use conditional aggregation to pivot stages into columns. Compute the conversion rate from one stage to the next. Skills: multi-column GROUP BY, pivoting via CASE WHEN, calculated columns.

Advanced

5. Year-over-year growth by segment

revenue (date, segment, amount). For each segment: current year revenue, prior year revenue, YoY growth percentage. Single query using conditional aggregation on year. Handle segments that exist in one year but not the other. Skills: conditional aggregation across years, COALESCE for missing data, percentage calculations.

Aggregation mistakes that cost you the interview

Four errors responsible for most GROUP BY failures in technical screens. Each is easy to fix once you know the pattern.

Selecting non-aggregated columns not in GROUP BY

SELECT name, department, COUNT(*) FROM employees GROUP BY department. Which name does SQL pick when there are 50 employees per department? MySQL silently picks an arbitrary one. PostgreSQL, Trino, and BigQuery reject the query. Always aggregate or group every column in SELECT.

Aggregate conditions in WHERE instead of HAVING

WHERE COUNT(*) > 5 does not work. COUNT has not been computed when WHERE runs. WHERE filters individual rows before grouping. HAVING filters groups after aggregation. Comes up in interviews as a quick filter question.

Double-counting with JOINs before GROUP BY

If you join orders to order_items before grouping by customer, each order gets multiplied by its item count. SUM(amount) is now wrong. Either aggregate first then join, or use COUNT(DISTINCT order_id) inside the aggregate.

Forgetting NULLs form their own group

GROUP BY treats all NULLs as the same group. If department has NULLs, you get a row for the NULL group. Correct SQL behavior, but it surprises people. Use COALESCE(department, 'Unknown') if you want a readable label.

Worked example: departments above a salary and headcount threshold

SELECT
  department,
  COUNT(*)                        AS headcount,
  ROUND(AVG(salary), 2)           AS avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5
   AND AVG(salary) > 75000
ORDER BY avg_salary DESC;

Both conditions go in HAVING because they reference aggregate results. COUNT(*) >= 5 in WHERE would be a syntax error since the count hasn't been computed at the WHERE stage. The GROUP BY includes d.id to handle departments that share a name.

GROUP BY FAQ

What's the difference between WHERE and HAVING?+
WHERE filters individual rows before grouping. HAVING filters groups after aggregation. WHERE for conditions on raw column values (status = 'active'). HAVING for conditions on aggregate results (COUNT(*) > 10). Both in the same query: WHERE runs first, then GROUP BY, then HAVING.
Can I use column aliases in GROUP BY or HAVING?+
Depends on the database. MySQL and BigQuery allow aliases in GROUP BY. PostgreSQL, Trino, and SQL Server do not. HAVING follows the same rules. For portability, repeat the expression instead of using the alias. In interviews, repeating the expression is always safe.
How does GROUP BY handle NULL values?+
All NULLs group together into a single group. Follows the SQL standard. GROUP BY a nullable column produces one row where that column is NULL, representing all rows with NULL values. Use COALESCE to replace NULLs with a default before grouping if needed.
What is conditional aggregation?+
CASE WHEN inside an aggregate to count or sum only rows meeting a condition. Example: SUM(CASE WHEN region = 'US' THEN revenue ELSE 0 END) AS us_revenue. Lets you pivot data and compute multiple filtered metrics in a single pass, avoiding subqueries or self-joins.
02 / Why practice

Aggregate under interview pressure

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Related guides