SQL HAVING Clause: Filtering Groups with Examples

HAVING is a 6-character fix for a logic error that tanks candidates in 7% of verified SQL rounds. WHERE can't see aggregates. HAVING can. That's the whole rule.

Rounds using HAVING

32%

GROUP BY companion rate

Characters to fix the bug

1,042

Verified rounds analyzed

What HAVING Does (and How It Differs from WHERE)

SQL runs clauses in a fixed logical sequence. Burn this order into your head and HAVING stops being mysterious. WHERE can't see aggregates because aggregates don't exist at step 2. That's literally the whole rule.

SQL Execution Order

1. FROM / JOIN

Tables are loaded and joined into a working row set.

2. WHERE

Individual rows are filtered. Aggregates do not exist yet, so you cannot reference them.

3. GROUP BY

Remaining rows collapse into groups. Each unique combination of grouped columns becomes one row.

4. HAVING

Groups are filtered using aggregate conditions. This is the first point where COUNT, SUM, AVG are available.

5. SELECT

Output columns and expressions are computed. Aliases are created here.

6. ORDER BY

Results are sorted. Can reference aliases because it runs after SELECT.

WHERE vs HAVING in Action

-- WHERE: filters rows BEFORE grouping
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE hire_date >= '2023-01-01'
GROUP BY department;

-- HAVING: filters groups AFTER aggregation
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5;

-- Both together: WHERE reduces rows, HAVING filters groups
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE hire_date >= '2023-01-01'
GROUP BY department
HAVING COUNT(*) >= 5;

Interview note: The combined query above is the pattern interviewers reach for most often. They want to see that you know WHERE and HAVING are not interchangeable. A candidate who puts a row-level filter in HAVING instead of WHERE signals they do not understand query execution order.

4 HAVING Patterns You Need to Know

These four patterns cover the vast majority of HAVING usage in production queries and interview problems. Each one tests a different aspect of how you think about group-level filtering.

HAVING with COUNT

The most common HAVING pattern in production and interviews. You group rows, count them, then keep only groups that meet a threshold. This is how you find active users, high-volume products, or departments with staffing problems. Interview note: COUNT(*) counts all rows including NULLs. COUNT(column) skips NULLs. In a HAVING clause this distinction matters when you're counting optional fields like discount_code or referral_id.

HAVING with SUM and AVG

Filtering on aggregated totals or averages. This pattern drives business questions like: which regions generate enough revenue to justify a local office? Which product categories have an average price above a certain tier? Interview note: You can combine multiple HAVING conditions with AND/OR. Each condition can use a different aggregate function. The engine evaluates all of them after grouping is complete.

HAVING with WHERE (Combined Filtering)

WHERE filters rows before grouping. HAVING filters groups after. Using both in the same query is the standard pattern for two-stage filtering. WHERE reduces the dataset first, which makes GROUP BY faster because it processes fewer rows. Interview note: Moving row-level filters from HAVING to WHERE is a real optimization. WHERE status = 'completed' is better than HAVING with a conditional count because the engine discards rows early. Interviewers test this exact distinction.

HAVING with Expressions

HAVING is not limited to simple comparisons. You can use arithmetic, CASE expressions, and even subqueries inside it. This pattern handles questions like: find groups where the spread between max and min exceeds a threshold, or groups where the top value is more than 3x the average. Interview note: You can reference any aggregate expression in HAVING, not just the ones in your SELECT list. HAVING MAX(price) / NULLIF(MIN(price), 0) > 10 is valid even if that ratio is not in your SELECT.

HAVING with COUNT

-- Find customers who placed more than 10 orders
SELECT
  customer_id,
  COUNT(*) AS order_count,
  SUM(total) AS lifetime_value
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 10
ORDER BY order_count DESC;

HAVING with SUM and AVG

-- Regions with total revenue above $100K
-- and average order value above $50
SELECT
  region,
  SUM(amount) AS total_revenue,
  ROUND(AVG(amount), 2) AS avg_order_value,
  COUNT(*) AS order_count
FROM orders
GROUP BY region
HAVING SUM(amount) > 100000
   AND AVG(amount) > 50;

HAVING with WHERE (Combined Filtering)

-- Among completed orders from 2024,
-- find users who spent more than $1000
SELECT
  user_id,
  COUNT(*) AS completed_orders,
  SUM(amount) AS total_spent
FROM orders
WHERE status = 'completed'
  AND order_date >= '2024-01-01'
GROUP BY user_id
HAVING SUM(amount) > 1000
ORDER BY total_spent DESC;

HAVING with Expressions

-- Find product categories where the price spread
-- (max minus min) exceeds $500
SELECT
  category,
  MIN(price) AS cheapest,
  MAX(price) AS most_expensive,
  MAX(price) - MIN(price) AS price_spread,
  COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING MAX(price) - MIN(price) > 500
ORDER BY price_spread DESC;

HAVING with Complex Conditions

Real interview questions rarely ask for a single HAVING condition. They layer requirements: find groups meeting multiple aggregate thresholds, groups where a ratio exceeds a bound, or groups where a conditional count passes a test. The syntax stays the same. The thinking gets harder.

Multiple Aggregate Conditions

-- Departments with at least 5 employees,
-- average salary above $80K, and total payroll under $2M
SELECT
  department,
  COUNT(*) AS headcount,
  ROUND(AVG(salary), 0) AS avg_salary,
  SUM(salary) AS total_payroll
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5
   AND AVG(salary) > 80000
   AND SUM(salary) < 2000000;

HAVING with CASE (Conditional Aggregation)

-- Find users where more than 30% of their orders
-- were cancelled or returned
SELECT
  oi.user_id,
  COUNT(*) AS total_orders,
  SUM(CASE WHEN o.status IN ('Cancelled', 'Returned') THEN 1 ELSE 0 END) AS failed_orders,
  ROUND(
    100.0 * SUM(CASE WHEN o.status IN ('Cancelled', 'Returned') THEN 1 ELSE 0 END) / COUNT(*),
    1
  ) AS failure_rate
FROM order_items oi
JOIN orders o ON o.order_id = oi.order_id
GROUP BY oi.user_id
HAVING 1.0 * SUM(CASE WHEN o.status IN ('Cancelled', 'Returned') THEN 1 ELSE 0 END)
       / COUNT(*) > 0.3;

HAVING with a Subquery

-- Departments where average salary exceeds the company average
SELECT
  department,
  ROUND(AVG(salary), 0) AS dept_avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > (
  SELECT AVG(salary) FROM employees
);

Interview note: The subquery in HAVING runs once and returns a scalar. It does not re-execute per group. The optimizer treats it like a constant after the first evaluation. This is a common follow-up question: Is that subquery correlated? No. It references no outer column, so it runs once.

5 HAVING Interview Questions

These span phone screens to onsite rounds. The first two test basic GROUP BY + HAVING syntax. The rest test whether you understand execution order, optimization, and edge cases.

Q1: Find all customers who have placed more than 5 orders.

What they test: Basic HAVING with COUNT. This is the simplest HAVING question and appears in nearly every SQL phone screen. They want to see you use GROUP BY + HAVING instead of a subquery or window function. Approach: GROUP BY customer_id, HAVING COUNT(*) > 5. State aloud that you use HAVING because you need to filter on an aggregate that does not exist until after grouping.

Q2: Find departments where the average salary is above $90,000 but the department has fewer than 20 employees.

What they test: Multiple HAVING conditions with different aggregates. They want to see you combine AVG and COUNT in the same HAVING clause connected by AND. Approach: GROUP BY department, HAVING AVG(salary) > 90000 AND COUNT(*) < 20. Both conditions sit in HAVING because both depend on aggregate values.

Q3: Why does this query fail? SELECT dept, COUNT(*) AS cnt FROM employees GROUP BY dept HAVING cnt > 3;

What they test: Execution order knowledge. HAVING runs before SELECT in the logical plan, so the alias "cnt" does not exist when HAVING evaluates. This trips up candidates who have only used MySQL (which allows it as a non-standard extension). Approach: Replace the alias with the full expression: HAVING COUNT(*) > 3. Explain that SELECT aliases are created after HAVING in standard SQL. Mention that MySQL permits this but PostgreSQL, SQL Server, and BigQuery do not.

Q4: Can you rewrite this HAVING query to be more efficient? SELECT region, COUNT(*) FROM orders GROUP BY region HAVING region = 'US';

What they test: Optimization awareness. The filter on "region" is a row-level condition, not an aggregate condition. Putting it in HAVING forces the engine to group all regions, then discard everything except US. Approach: Move it to WHERE: WHERE region = 'US' GROUP BY region. This filters rows before grouping, so the engine only aggregates US orders. Fewer rows in GROUP BY means less memory and faster execution.

Q5: Find products where more than 20% of orders were returned.

What they test: Conditional aggregation inside HAVING. This combines CASE WHEN with SUM and COUNT in a ratio calculation. Approach: GROUP BY product_id, HAVING SUM(CASE WHEN status = 'returned' THEN 1 ELSE 0 END)::float / COUNT(*) > 0.20. Use NULLIF(COUNT(*), 0) in the denominator if there is any chance of zero orders for a product.

3 HAVING Mistakes That Fail Interviews

These are not edge cases. They show up in interview debriefs as reasons candidates get rejected at companies like Meta, Amazon, and Databricks.

Using HAVING for Row-Level Filters

Putting a non-aggregate condition in HAVING when it belongs in WHERE. The query still runs (HAVING can filter on non-aggregate columns in most engines), but it forces the engine to group all rows first. On a table with millions of rows across 50 regions, filtering one region in HAVING means grouping all 50, then throwing away 49. This mistake signals to the interviewer that you do not understand SQL execution order.

Referencing SELECT Aliases in HAVING

In standard SQL, HAVING evaluates before SELECT in the logical plan. Aliases defined in SELECT do not exist when HAVING runs. MySQL allows this as a non-standard extension, which creates bad habits. PostgreSQL, SQL Server, BigQuery, Snowflake, and Trino all reject it. Repeat the full aggregate expression in HAVING. It looks redundant, but it is correct SQL. The optimizer is smart enough not to compute it twice.

Forgetting HAVING When Filtering Aggregates

Attempting to put an aggregate in WHERE. This is a syntax error in every SQL engine. WHERE cannot reference aggregate functions because it runs before GROUP BY. The fix is always to move the condition to HAVING. This error usually happens under time pressure when candidates write quickly and forget to check clause order. Write the clauses in execution order: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY.

SQL HAVING FAQ

What is the difference between WHERE and HAVING in SQL?+

WHERE filters individual rows before GROUP BY runs. HAVING filters groups after GROUP BY and aggregation are complete. WHERE cannot reference aggregate functions like COUNT() or SUM() because those values do not exist yet when WHERE executes. HAVING can reference aggregates because it runs after grouping. Use WHERE for row-level conditions (status = 'active', date > '2024-01-01') and HAVING for group-level conditions (COUNT(*) > 5, SUM(amount) > 1000).

Can you use HAVING without GROUP BY?+

Yes, but it is rare and engine-dependent. Without GROUP BY, the entire result set is treated as one group. HAVING then filters that single group. For example: SELECT COUNT(*) FROM orders HAVING COUNT(*) > 100 returns the count only if there are more than 100 orders, otherwise returns no rows. PostgreSQL, MySQL, and SQL Server all support this. In practice, you will almost always see HAVING paired with GROUP BY.

Can you use column aliases in HAVING?+

In standard SQL, no. HAVING runs before SELECT in the logical execution order, so aliases defined in SELECT are not yet available. PostgreSQL and SQL Server enforce this strictly. MySQL is an exception: it allows column aliases in HAVING as a non-standard extension. In interviews, assume strict SQL and repeat the full expression in HAVING rather than referencing an alias.

Is HAVING slower than WHERE?+

HAVING itself is not inherently slow, but putting a filter in HAVING when it belongs in WHERE forces the engine to group more rows than necessary. If you filter on a non-aggregated column in HAVING (e.g., HAVING region = 'US'), the engine must first group all regions, then discard all but 'US'. Moving that to WHERE (WHERE region = 'US') eliminates other regions before grouping starts, which is faster. Only use HAVING for conditions that require aggregate values.

02 / Why practice

Six Characters. One Prod Fire. Practice the Swap.

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Start Practicing

Related Guides

SQL GROUP BY→

Execution order, aggregation patterns, ROLLUP/CUBE, and the interview questions that test real grouping knowledge

GROUP BY Practice Problems→

Hands-on GROUP BY and HAVING problems with real SQL execution at interview difficulty

SQL Interview Questions→

Complete guide to every SQL topic tested in data engineering interviews