A fraud team at a marketplace spent a week debugging why their "users with more than 10 failed logins" alert was missing obviously bad actors. The culprit: WHERE count(*) > 10 in a query that silently returned zero rows on Postgres and inconsistent rows on MySQL. The fix was swapping WHERE for HAVING. Six characters, one week of noise.
HAVING appears in 7% of our verified interview corpus and GROUP BY in 32%. It's the quiet companion clause, and interviewers know most candidates can't articulate the difference under pressure. This page covers every HAVING pattern you'll see in a phone screen, with the failure modes that actually hit prod.
Rounds using HAVING
GROUP BY companion rate
Characters to fix the bug
Verified rounds analyzed
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
SQL runs clauses in a fixed logical sequence. Burn this order into your head and HAVING stops being mysterious. WHERE can't see aggregates because aggregates don't exist at step 2. That's literally the whole rule.
WHERE sits at step 2. HAVING sits at step 4. That gap is why WHERE cannot see aggregates and HAVING can. The engine has not computed groups yet when WHERE runs. By the time HAVING runs, groups exist and aggregate functions have values.
-- WHERE: filters rows BEFORE grouping
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE hire_date >= '2023-01-01'
GROUP BY department;
-- HAVING: filters groups AFTER aggregation
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5;
-- Both together: WHERE reduces rows, HAVING filters groups
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE hire_date >= '2023-01-01'
GROUP BY department
HAVING COUNT(*) >= 5;Interview note: The combined query above is the pattern interviewers reach for most often. They want to see that you know WHERE and HAVING are not interchangeable. A candidate who puts a row-level filter in HAVING instead of WHERE signals they do not understand query execution order.
These four patterns cover the vast majority of HAVING usage in production queries and interview problems. Each one tests a different aspect of how you think about group-level filtering.
The most common HAVING pattern in production and interviews. You group rows, count them, then keep only groups that meet a threshold. This is how you find active users, high-volume products, or departments with staffing problems.
-- Find customers who placed more than 10 orders
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total) AS lifetime_value
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 10
ORDER BY order_count DESC;Interview note: COUNT(*) counts all rows including NULLs. COUNT(column) skips NULLs. In a HAVING clause this distinction matters when you're counting optional fields like discount_code or referral_id.
Filtering on aggregated totals or averages. This pattern drives business questions like: which regions generate enough revenue to justify a local office? Which product categories have an average price above a certain tier?
-- Regions with total revenue above $100K
-- and average order value above $50
SELECT
region,
SUM(amount) AS total_revenue,
ROUND(AVG(amount), 2) AS avg_order_value,
COUNT(*) AS order_count
FROM orders
GROUP BY region
HAVING SUM(amount) > 100000
AND AVG(amount) > 50;Interview note: You can combine multiple HAVING conditions with AND/OR. Each condition can use a different aggregate function. The engine evaluates all of them after grouping is complete.
WHERE filters rows before grouping. HAVING filters groups after. Using both in the same query is the standard pattern for two-stage filtering. WHERE reduces the dataset first, which makes GROUP BY faster because it processes fewer rows.
-- Among completed orders from 2024,
-- find users who spent more than $1000
SELECT
user_id,
COUNT(*) AS completed_orders,
SUM(amount) AS total_spent
FROM orders
WHERE status = 'completed'
AND order_date >= '2024-01-01'
GROUP BY user_id
HAVING SUM(amount) > 1000
ORDER BY total_spent DESC;Interview note: Moving row-level filters from HAVING to WHERE is a real optimization. WHERE status = 'completed' is better than HAVING with a conditional count because the engine discards rows early. Interviewers test this exact distinction.
HAVING is not limited to simple comparisons. You can use arithmetic, CASE expressions, and even subqueries inside it. This pattern handles questions like: find groups where the spread between max and min exceeds a threshold, or groups where the top value is more than 3x the average.
-- Find product categories where the price spread
-- (max minus min) exceeds $500
SELECT
category,
MIN(price) AS cheapest,
MAX(price) AS most_expensive,
MAX(price) - MIN(price) AS price_spread,
COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING MAX(price) - MIN(price) > 500
ORDER BY price_spread DESC;Interview note: You can reference any aggregate expression in HAVING, not just the ones in your SELECT list. HAVING MAX(price) / NULLIF(MIN(price), 0) > 10 is valid even if that ratio is not in your SELECT.
Real interview questions rarely ask for a single HAVING condition. They layer requirements: find groups meeting multiple aggregate thresholds, groups where a ratio exceeds a bound, or groups where a conditional count passes a test. The syntax stays the same. The thinking gets harder.
Combine conditions with AND and OR. Each condition can reference a different aggregate function. The engine evaluates all of them against each group.
-- Departments with at least 5 employees,
-- average salary above $80K, and total payroll under $2M
SELECT
department,
COUNT(*) AS headcount,
ROUND(AVG(salary), 0) AS avg_salary,
SUM(salary) AS total_payroll
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5
AND AVG(salary) > 80000
AND SUM(salary) < 2000000;You can put CASE inside an aggregate inside HAVING. This lets you filter groups based on counts or sums of a specific subset within each group, without touching WHERE.
-- Find users where more than 30% of their orders failed
SELECT
user_id,
COUNT(*) AS total_orders,
SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) AS failed_orders,
ROUND(
100.0 * SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) / COUNT(*),
1
) AS failure_rate
FROM orders
GROUP BY user_id
HAVING SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END)::float
/ COUNT(*) > 0.3;Sometimes the threshold is not a fixed number but a computed value. You can put a subquery inside HAVING to compare each group against a dynamic benchmark.
-- Departments where average salary exceeds the company average
SELECT
department,
ROUND(AVG(salary), 0) AS dept_avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > (
SELECT AVG(salary) FROM employees
);Interview note: The subquery in HAVING runs once and returns a scalar. It does not re-execute per group. The optimizer treats it like a constant after the first evaluation. This is a common follow-up question: “Is that subquery correlated?” No. It references no outer column, so it runs once.
These span phone screens to onsite rounds. The first two test basic GROUP BY + HAVING syntax. The rest test whether you understand execution order, optimization, and edge cases.
What they test:
Basic HAVING with COUNT. This is the simplest HAVING question and appears in nearly every SQL phone screen. They want to see you use GROUP BY + HAVING instead of a subquery or window function.
Approach:
GROUP BY customer_id, HAVING COUNT(*) > 5. State aloud that you use HAVING because you need to filter on an aggregate that does not exist until after grouping.
What they test:
Multiple HAVING conditions with different aggregates. They want to see you combine AVG and COUNT in the same HAVING clause connected by AND.
Approach:
GROUP BY department, HAVING AVG(salary) > 90000 AND COUNT(*) < 20. Both conditions sit in HAVING because both depend on aggregate values.
What they test:
Execution order knowledge. HAVING runs before SELECT in the logical plan, so the alias “cnt” does not exist when HAVING evaluates. This trips up candidates who have only used MySQL (which allows it as a non-standard extension).
Approach:
Replace the alias with the full expression: HAVING COUNT(*) > 3. Explain that SELECT aliases are created after HAVING in standard SQL. Mention that MySQL permits this but PostgreSQL, SQL Server, and BigQuery do not.
What they test:
Optimization awareness. The filter on “region” is a row-level condition, not an aggregate condition. Putting it in HAVING forces the engine to group all regions, then discard everything except US.
Approach:
Move it to WHERE: WHERE region = 'US' GROUP BY region. This filters rows before grouping, so the engine only aggregates US orders. Fewer rows in GROUP BY means less memory and faster execution.
What they test:
Conditional aggregation inside HAVING. This combines CASE WHEN with SUM and COUNT in a ratio calculation. It is a step up from basic HAVING and tests whether you can think about proportions at the group level.
Approach:
GROUP BY product_id, HAVING SUM(CASE WHEN status = 'returned' THEN 1 ELSE 0 END)::float / COUNT(*) > 0.20. Use NULLIF(COUNT(*), 0) in the denominator if there is any chance of zero orders for a product.
These are not edge cases. They show up in interview debriefs as reasons candidates get rejected at companies like Meta, Amazon, and Databricks.
Putting a non-aggregate condition in HAVING when it belongs in WHERE. The query still runs (HAVING can filter on non-aggregate columns in most engines), but it forces the engine to group all rows first. On a table with millions of rows across 50 regions, filtering one region in HAVING means grouping all 50, then throwing away 49.
-- BAD: groups all regions, then discards non-US
SELECT region, SUM(amount)
FROM orders
GROUP BY region
HAVING region = 'US';
-- GOOD: filters to US rows first, then groups
SELECT region, SUM(amount)
FROM orders
WHERE region = 'US'
GROUP BY region;This mistake signals to the interviewer that you do not understand SQL execution order. It is one of the fastest ways to get a “no hire” signal on a SQL round.
In standard SQL, HAVING evaluates before SELECT in the logical plan. Aliases defined in SELECT do not exist when HAVING runs. MySQL allows this as a non-standard extension, which creates bad habits. PostgreSQL, SQL Server, BigQuery, Snowflake, and Trino all reject it.
-- FAILS in PostgreSQL, BigQuery, Snowflake
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING emp_count > 10;
-- WORKS everywhere
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;Repeat the full aggregate expression in HAVING. It looks redundant, but it is correct SQL. The optimizer is smart enough not to compute it twice.
Attempting to put an aggregate in WHERE. This is a syntax error in every SQL engine. WHERE cannot reference aggregate functions because it runs before GROUP BY. The fix is always to move the condition to HAVING.
-- ERROR: aggregate functions not allowed in WHERE
SELECT department, AVG(salary)
FROM employees
GROUP BY department
WHERE AVG(salary) > 80000;
-- FIX: use HAVING for aggregate conditions
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 80000;This error usually happens under time pressure when candidates write quickly and forget to check clause order. Slow down in interviews. Write the clauses in execution order: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY.
Pick the wrong filter clause and your alert misses a fraud ring. Pick the right one and nobody notices. Run these HAVING problems live until the distinction is automatic.
Execution order, aggregation patterns, ROLLUP/CUBE, and the interview questions that test real grouping knowledge
Hands-on GROUP BY and HAVING problems with real SQL execution at interview difficulty
Complete guide to every SQL topic tested in data engineering interviews