SQL GROUP BY and HAVING Practice
Aggregation is the single most tested SQL category, appearing in roughly one out of every four SQL interview questions. Five problems covering GROUP BY, HAVING, CASE WHEN inside aggregates, and multi-column grouping at interview difficulty.
What you need to know
GROUP BY is the foundation of every analytical query. Aggregation outnumbers JOINs, window functions, and every other SQL category in interview data. Four areas cover what interviewers test.
GROUP BY basics
GROUP BY collapses rows that share the same value(s) into a single output row. Every column in your SELECT must either appear in GROUP BY or be wrapped in an aggregate function (COUNT, SUM, AVG, MIN, MAX). This rule is enforced by every SQL engine and is the top source of GROUP BY errors. Example: SELECT department, COUNT(*) AS headcount FROM employees GROUP BY department.
HAVING filters
HAVING filters groups after aggregation. WHERE filters rows before aggregation. If you need departments with more than 10 employees, you cannot put COUNT(*) > 10 in WHERE because the count has not been computed yet. HAVING exists for exactly this purpose. Example: SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 80000.
Conditional aggregation with CASE WHEN
CASE WHEN inside an aggregate lets you count or sum only rows meeting a condition. Replaces multiple self-joins or subqueries. One of the most useful patterns and appears in nearly every analytical interview. Classic form: SUM(CASE WHEN condition THEN 1 ELSE 0 END). Example: SELECT department, SUM(CASE WHEN status = 'active' THEN 1 ELSE 0 END) AS active, SUM(CASE WHEN status = 'churned' THEN 1 ELSE 0 END) AS churned FROM users GROUP BY department.
Multi-column grouping
GROUP BY multiple columns creates a group for each unique combination. GROUP BY year, quarter gives one row per quarter per year. The order does not affect the result but affects readability. Put the broader grouping first (year before quarter, country before city). Example: SELECT year, quarter, region, SUM(revenue) FROM sales GROUP BY year, quarter, region.
Five GROUP BY practice problems
Each problem adds a layer of difficulty. By problem 5, you are combining conditional aggregation with year-over-year calculations.
1. Revenue per category
orders (product_category, amount). Compute total revenue and order count per category, sorted by revenue descending. The baseline GROUP BY problem. If you can't write this in under a minute, spend more time on fundamentals. Skills: GROUP BY, SUM, COUNT, ORDER BY.
2. Departments above threshold
Find departments where average salary exceeds $75,000 AND headcount is at least 5. Return department, average salary, headcount. Tests the WHERE vs HAVING distinction. Both conditions go in HAVING because they reference aggregates. Skills: HAVING with multiple conditions, AVG, COUNT.
3. Monthly active vs churned users
user_events (user_id, event_date, event_type). For each month: total unique users, users who had a purchase event, users who had zero purchase events. Conditional aggregation, one query, no subqueries, no self-joins. Skills: COUNT(DISTINCT), CASE WHEN inside aggregates, date truncation.
4. Conversion funnel by source
funnel_events (user_id, event_stage in {visit, signup, trial, paid}, source). Build a funnel showing count at each stage per source. Use conditional aggregation to pivot stages into columns. Compute the conversion rate from one stage to the next. Skills: multi-column GROUP BY, pivoting via CASE WHEN, calculated columns.
5. Year-over-year growth by segment
revenue (date, segment, amount). For each segment: current year revenue, prior year revenue, YoY growth percentage. Single query using conditional aggregation on year. Handle segments that exist in one year but not the other. Skills: conditional aggregation across years, COALESCE for missing data, percentage calculations.
Aggregation mistakes that cost you the interview
Four errors responsible for most GROUP BY failures in technical screens. Each is easy to fix once you know the pattern.
Selecting non-aggregated columns not in GROUP BY
SELECT name, department, COUNT(*) FROM employees GROUP BY department. Which name does SQL pick when there are 50 employees per department? MySQL silently picks an arbitrary one. PostgreSQL, Trino, and BigQuery reject the query. Always aggregate or group every column in SELECT.
Aggregate conditions in WHERE instead of HAVING
WHERE COUNT(*) > 5 does not work. COUNT has not been computed when WHERE runs. WHERE filters individual rows before grouping. HAVING filters groups after aggregation. Comes up in interviews as a quick filter question.
Double-counting with JOINs before GROUP BY
If you join orders to order_items before grouping by customer, each order gets multiplied by its item count. SUM(amount) is now wrong. Either aggregate first then join, or use COUNT(DISTINCT order_id) inside the aggregate.
Forgetting NULLs form their own group
GROUP BY treats all NULLs as the same group. If department has NULLs, you get a row for the NULL group. Correct SQL behavior, but it surprises people. Use COALESCE(department, 'Unknown') if you want a readable label.
Worked example: departments above a salary and headcount threshold
SELECT
department,
COUNT(*) AS headcount,
ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5
AND AVG(salary) > 75000
ORDER BY avg_salary DESC;Both conditions go in HAVING because they reference aggregate results. COUNT(*) >= 5 in WHERE would be a syntax error since the count hasn't been computed at the WHERE stage. The GROUP BY includes d.id to handle departments that share a name.
GROUP BY FAQ
What's the difference between WHERE and HAVING?+
Can I use column aliases in GROUP BY or HAVING?+
How does GROUP BY handle NULL values?+
What is conditional aggregation?+
Aggregate under interview pressure
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition