SQL HAVING Clause: Filtering Groups with Examples
HAVING is a 6-character fix for a logic error that tanks candidates in 7% of verified SQL rounds. WHERE can't see aggregates. HAVING can. That's the whole rule.
What HAVING Does (and How It Differs from WHERE)
SQL runs clauses in a fixed logical sequence. Burn this order into your head and HAVING stops being mysterious. WHERE can't see aggregates because aggregates don't exist at step 2. That's literally the whole rule.
HAVING with Complex Conditions
Real interview questions rarely ask for a single HAVING condition. They layer requirements: find groups meeting multiple aggregate thresholds, groups where a ratio exceeds a bound, or groups where a conditional count passes a test. The syntax stays the same. The thinking gets harder.
Know HAVING the way the interviewer who asks it knows it.
Tables With Most DQ Failures
The tables with the most failures.
Pulled from debriefs where SQL was the gate.
SQL Execution Order
1. FROM / JOIN
Tables are loaded and joined into a working row set.
2. WHERE
Individual rows are filtered. Aggregates do not exist yet, so you cannot reference them.
3. GROUP BY
Remaining rows collapse into groups. Each unique combination of grouped columns becomes one row.
4. HAVING
Groups are filtered using aggregate conditions. This is the first point where COUNT, SUM, AVG are available.
5. SELECT
Output columns and expressions are computed. Aliases are created here.
6. ORDER BY
Results are sorted. Can reference aliases because it runs after SELECT.
WHERE vs HAVING in Action
-- WHERE: filters rows BEFORE grouping
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE hire_date >= '2023-01-01'
GROUP BY department;
-- HAVING: filters groups AFTER aggregation
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5;
-- Both together: WHERE reduces rows, HAVING filters groups
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE hire_date >= '2023-01-01'
GROUP BY department
HAVING COUNT(*) >= 5;Interview note: The combined query above is the pattern interviewers reach for most often. They want to see that you know WHERE and HAVING are not interchangeable. A candidate who puts a row-level filter in HAVING instead of WHERE signals they do not understand query execution order.
4 HAVING Patterns You Need to Know
These four patterns cover the vast majority of HAVING usage in production queries and interview problems. Each one tests a different aspect of how you think about group-level filtering.
HAVING with COUNT
The most common HAVING pattern in production and interviews. You group rows, count them, then keep only groups that meet a threshold. This is how you find active users, high-volume products, or departments with staffing problems. Interview note: COUNT(*) counts all rows including NULLs. COUNT(column) skips NULLs. In a HAVING clause this distinction matters when you're counting optional fields like discount_code or referral_id.
HAVING with SUM and AVG
Filtering on aggregated totals or averages. This pattern drives business questions like: which regions generate enough revenue to justify a local office? Which product categories have an average price above a certain tier? Interview note: You can combine multiple HAVING conditions with AND/OR. Each condition can use a different aggregate function. The engine evaluates all of them after grouping is complete.
HAVING with WHERE (Combined Filtering)
WHERE filters rows before grouping. HAVING filters groups after. Using both in the same query is the standard pattern for two-stage filtering. WHERE reduces the dataset first, which makes GROUP BY faster because it processes fewer rows. Interview note: Moving row-level filters from HAVING to WHERE is a real optimization. WHERE status = 'completed' is better than HAVING with a conditional count because the engine discards rows early. Interviewers test this exact distinction.
HAVING with Expressions
HAVING is not limited to simple comparisons. You can use arithmetic, CASE expressions, and even subqueries inside it. This pattern handles questions like: find groups where the spread between max and min exceeds a threshold, or groups where the top value is more than 3x the average. Interview note: You can reference any aggregate expression in HAVING, not just the ones in your SELECT list. HAVING MAX(price) / NULLIF(MIN(price), 0) > 10 is valid even if that ratio is not in your SELECT.
HAVING with COUNT
-- Find customers who placed more than 10 orders
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total) AS lifetime_value
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 10
ORDER BY order_count DESC;HAVING with SUM and AVG
-- Regions with total revenue above $100K
-- and average order value above $50
SELECT
region,
SUM(amount) AS total_revenue,
ROUND(AVG(amount), 2) AS avg_order_value,
COUNT(*) AS order_count
FROM orders
GROUP BY region
HAVING SUM(amount) > 100000
AND AVG(amount) > 50;HAVING with WHERE (Combined Filtering)
-- Among completed orders from 2024,
-- find users who spent more than $1000
SELECT
user_id,
COUNT(*) AS completed_orders,
SUM(amount) AS total_spent
FROM orders
WHERE status = 'completed'
AND order_date >= '2024-01-01'
GROUP BY user_id
HAVING SUM(amount) > 1000
ORDER BY total_spent DESC;HAVING with Expressions
-- Find product categories where the price spread
-- (max minus min) exceeds $500
SELECT
category,
MIN(price) AS cheapest,
MAX(price) AS most_expensive,
MAX(price) - MIN(price) AS price_spread,
COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING MAX(price) - MIN(price) > 500
ORDER BY price_spread DESC;Multiple Aggregate Conditions
-- Departments with at least 5 employees,
-- average salary above $80K, and total payroll under $2M
SELECT
department,
COUNT(*) AS headcount,
ROUND(AVG(salary), 0) AS avg_salary,
SUM(salary) AS total_payroll
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5
AND AVG(salary) > 80000
AND SUM(salary) < 2000000;HAVING with CASE (Conditional Aggregation)
-- Find users where more than 30% of their orders failed
SELECT
user_id,
COUNT(*) AS total_orders,
SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) AS failed_orders,
ROUND(
100.0 * SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) / COUNT(*),
1
) AS failure_rate
FROM orders
GROUP BY user_id
HAVING SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END)::float
/ COUNT(*) > 0.3;HAVING with a Subquery
-- Departments where average salary exceeds the company average
SELECT
department,
ROUND(AVG(salary), 0) AS dept_avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > (
SELECT AVG(salary) FROM employees
);Interview note: The subquery in HAVING runs once and returns a scalar. It does not re-execute per group. The optimizer treats it like a constant after the first evaluation. This is a common follow-up question: Is that subquery correlated? No. It references no outer column, so it runs once.
5 HAVING Interview Questions
These span phone screens to onsite rounds. The first two test basic GROUP BY + HAVING syntax. The rest test whether you understand execution order, optimization, and edge cases.
Q1: Find all customers who have placed more than 5 orders.
What they test: Basic HAVING with COUNT. This is the simplest HAVING question and appears in nearly every SQL phone screen. They want to see you use GROUP BY + HAVING instead of a subquery or window function. Approach: GROUP BY customer_id, HAVING COUNT(*) > 5. State aloud that you use HAVING because you need to filter on an aggregate that does not exist until after grouping.
Q2: Find departments where the average salary is above $90,000 but the department has fewer than 20 employees.
What they test: Multiple HAVING conditions with different aggregates. They want to see you combine AVG and COUNT in the same HAVING clause connected by AND. Approach: GROUP BY department, HAVING AVG(salary) > 90000 AND COUNT(*) < 20. Both conditions sit in HAVING because both depend on aggregate values.
Q3: Why does this query fail? SELECT dept, COUNT(*) AS cnt FROM employees GROUP BY dept HAVING cnt > 3;
What they test: Execution order knowledge. HAVING runs before SELECT in the logical plan, so the alias "cnt" does not exist when HAVING evaluates. This trips up candidates who have only used MySQL (which allows it as a non-standard extension). Approach: Replace the alias with the full expression: HAVING COUNT(*) > 3. Explain that SELECT aliases are created after HAVING in standard SQL. Mention that MySQL permits this but PostgreSQL, SQL Server, and BigQuery do not.
Q4: Can you rewrite this HAVING query to be more efficient? SELECT region, COUNT(*) FROM orders GROUP BY region HAVING region = 'US';
What they test: Optimization awareness. The filter on "region" is a row-level condition, not an aggregate condition. Putting it in HAVING forces the engine to group all regions, then discard everything except US. Approach: Move it to WHERE: WHERE region = 'US' GROUP BY region. This filters rows before grouping, so the engine only aggregates US orders. Fewer rows in GROUP BY means less memory and faster execution.
Q5: Find products where more than 20% of orders were returned.
What they test: Conditional aggregation inside HAVING. This combines CASE WHEN with SUM and COUNT in a ratio calculation. Approach: GROUP BY product_id, HAVING SUM(CASE WHEN status = 'returned' THEN 1 ELSE 0 END)::float / COUNT(*) > 0.20. Use NULLIF(COUNT(*), 0) in the denominator if there is any chance of zero orders for a product.
3 HAVING Mistakes That Fail Interviews
These are not edge cases. They show up in interview debriefs as reasons candidates get rejected at companies like Meta, Amazon, and Databricks.
Using HAVING for Row-Level Filters
Putting a non-aggregate condition in HAVING when it belongs in WHERE. The query still runs (HAVING can filter on non-aggregate columns in most engines), but it forces the engine to group all rows first. On a table with millions of rows across 50 regions, filtering one region in HAVING means grouping all 50, then throwing away 49. This mistake signals to the interviewer that you do not understand SQL execution order.
Referencing SELECT Aliases in HAVING
In standard SQL, HAVING evaluates before SELECT in the logical plan. Aliases defined in SELECT do not exist when HAVING runs. MySQL allows this as a non-standard extension, which creates bad habits. PostgreSQL, SQL Server, BigQuery, Snowflake, and Trino all reject it. Repeat the full aggregate expression in HAVING. It looks redundant, but it is correct SQL. The optimizer is smart enough not to compute it twice.
Forgetting HAVING When Filtering Aggregates
Attempting to put an aggregate in WHERE. This is a syntax error in every SQL engine. WHERE cannot reference aggregate functions because it runs before GROUP BY. The fix is always to move the condition to HAVING. This error usually happens under time pressure when candidates write quickly and forget to check clause order. Write the clauses in execution order: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY.
SQL HAVING FAQ
What is the difference between WHERE and HAVING in SQL?+
Can you use HAVING without GROUP BY?+
Can you use column aliases in HAVING?+
Is HAVING slower than WHERE?+
Six Characters. One Prod Fire. Practice the Swap.
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition
Related Guides
Execution order, aggregation patterns, ROLLUP/CUBE, and the interview questions that test real grouping knowledge
Hands-on GROUP BY and HAVING problems with real SQL execution at interview difficulty
Complete guide to every SQL topic tested in data engineering interviews