SQL Practice
Aggregation is the single most tested SQL category, appearing in roughly 1 out of every 4 SQL interview questions. GROUP BY is the most-tested individual concept across all data engineering interviews. Five problems covering GROUP BY, HAVING, CASE WHEN inside aggregates, and multi-column grouping at interview difficulty.
GROUP BY is the foundation of every analytical query. Based on DataDriven's analysis of verified interview data, aggregation questions outnumber JOINs, window functions, and every other SQL category. These four areas cover what interviewers test.
GROUP BY collapses rows that share the same value(s) into a single output row. Every column in your SELECT must either appear in GROUP BY or be wrapped in an aggregate function (COUNT, SUM, AVG, MIN, MAX). This rule is enforced by every SQL engine and is the #1 source of GROUP BY errors.
SELECT department, COUNT(*) AS headcount FROM employees GROUP BY departmentHAVING filters groups after aggregation. WHERE filters rows before aggregation. This distinction is fundamental. If you need to find departments with more than 10 employees, you cannot put COUNT(*) > 10 in WHERE because the count has not been computed yet. HAVING exists for exactly this purpose.
SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 80000CASE WHEN inside an aggregate function lets you count or sum only rows that meet a condition. This replaces multiple self-joins or subqueries. It is one of the most useful SQL patterns and appears in nearly every analytical interview. The classic form: SUM(CASE WHEN condition THEN 1 ELSE 0 END).
SELECT department, SUM(CASE WHEN status = 'active' THEN 1 ELSE 0 END) AS active_count, SUM(CASE WHEN status = 'churned' THEN 1 ELSE 0 END) AS churned_count FROM users GROUP BY departmentGROUP BY multiple columns creates a group for each unique combination. GROUP BY year, quarter gives you one row per quarter per year. The order of columns in GROUP BY does not affect the result, but it affects readability. Put the broader grouping first (year before quarter, country before city).
SELECT year, quarter, region, SUM(revenue) FROM sales GROUP BY year, quarter, regionEach problem adds a layer of difficulty. By problem 5, you are combining conditional aggregation with year-over-year calculations.
Given an orders table with product_category and amount, calculate total revenue and order count per category. Sort by revenue descending. This is the baseline GROUP BY problem. If you cannot write this in under a minute, spend more time on fundamentals.
Skills tested: Basic GROUP BY, SUM, COUNT, ORDER BY
Find departments where the average salary exceeds $75,000 AND the headcount is at least 5. Return department name, average salary, and headcount. This tests whether you know the difference between WHERE and HAVING. Both conditions must go in HAVING because they reference aggregates.
Skills tested: HAVING with multiple conditions, AVG, COUNT
Given a user_events table with user_id, event_date, and event_type, calculate for each month: total unique users, users who had a purchase event, and users who had zero purchase events. Use conditional aggregation. One query, no subqueries, no self-joins.
Skills tested: COUNT(DISTINCT), conditional aggregation with CASE WHEN, date truncation
Given a funnel_events table with user_id, event_stage (visit, signup, trial, paid), and source (organic, paid, referral), build a funnel showing the count at each stage per source. Use conditional aggregation to pivot stages into columns. Calculate the conversion rate from one stage to the next.
Skills tested: Multi-column GROUP BY, conditional aggregation, calculated columns, pivoting
Given a revenue table with date, segment, and amount, calculate for each segment: current year revenue, prior year revenue, and YoY growth percentage. Do this in a single query using conditional aggregation on the year. Handle segments that exist in one year but not the other.
Skills tested: Conditional aggregation across years, COALESCE for missing data, percentage calculations
These four errors are responsible for most GROUP BY failures in technical screens. Each one is easy to fix once you know the pattern.
SELECT name, department, COUNT(*) FROM employees GROUP BY department. Which name does SQL pick when there are 50 employees per department? MySQL silently picks an arbitrary one. PostgreSQL, Trino, and BigQuery reject the query. Always aggregate or group every column in SELECT.
WHERE COUNT(*) > 5 does not work. COUNT has not been computed when WHERE runs. WHERE filters individual rows before grouping. HAVING filters groups after aggregation. This comes up in interviews as a quick filter question.
If you join orders to order_items before grouping by customer, each order gets multiplied by its item count. SUM(amount) is now wrong. Either aggregate first then join, or use DISTINCT inside the aggregate: COUNT(DISTINCT order_id).
GROUP BY treats all NULLs as the same group. If your department column has NULLs, you get a row for the NULL group. This is correct SQL behavior, but it surprises people. Use COALESCE(department, 'Unknown') if you want a readable label.
Find departments where the average salary exceeds $75,000 and the headcount is at least 5. Return the department name, average salary, and employee count.
SELECT
d.name AS department,
COUNT(*) AS headcount,
ROUND(AVG(e.salary), 2) AS avg_salary
FROM employees e
JOIN departments d ON d.id = e.department_id
GROUP BY d.id, d.name
HAVING COUNT(*) >= 5
AND AVG(e.salary) > 75000
ORDER BY avg_salary DESC;Both conditions go in HAVING because they reference aggregate results. Putting COUNT(*) >= 5 in WHERE would cause a syntax error since the count has not been computed at the WHERE stage. The GROUP BY includes d.id to handle departments that share a name.
Expected output
department | headcount | avg_salary
-------------+-----------+-----------
Engineering | 12 | 105833.33
Product | 7 | 92142.86
Data Science | 5 | 88400.00GROUP BY looks simple until you add HAVING, conditional aggregation, and multi-table joins. 32.7% of SQL interview rounds happen at the phone screen stage, where aggregation is the most common topic.