SQL GROUP BY Multiple Columns Guide
GROUP BY appears in 32% of verified DE interview SQL questions. It's the second most common construct in the corpus, behind SELECT/FROM. Roughly 14% of those GROUP BY questions use a composite key of two or more columns, which puts multi-column grouping at about 1 in every 22 SQL rounds.
How Composite Groups Form
Know GROUP BY the way the interviewer who asks it knows it.
-- Source data (5 rows)
-- department | quarter | revenue
-- Sales | Q1 | 100
-- Sales | Q1 | 200
-- Sales | Q2 | 150
-- Marketing | Q1 | 80
-- Marketing | Q2 | 120
SELECT department, quarter, SUM(revenue) AS total
FROM sales
GROUP BY department, quarter;
-- Result (4 groups):
-- Sales | Q1 | 300 (2 rows combined)
-- Sales | Q2 | 150 (1 row)
-- Marketing | Q1 | 80 (1 row)
-- Marketing | Q2 | 120 (1 row)The Notification That Paid Off
The message went out to thousands. A smaller number actually bit.
Pulled from debriefs where SQL was the gate.
Basic Multi-Column GROUP BY
When you GROUP BY two or more columns, SQL creates one group for each unique combination of values across those columns. A table with 3 departments and 4 job titles could produce up to 12 groups (3 x 4), though only combinations that exist in the data create actual groups. Each group gets one row in the result.
-- Revenue by department and quarter
SELECT
department,
quarter,
SUM(revenue) AS total_revenue,
COUNT(*) AS deal_count
FROM sales
GROUP BY department, quarter
ORDER BY department, quarter;
-- If department has values (Sales, Marketing)
-- and quarter has values (Q1, Q2, Q3, Q4)
-- up to 8 groups are possible:
-- Sales/Q1, Sales/Q2, Sales/Q3, Sales/Q4
-- Marketing/Q1, Marketing/Q2, Marketing/Q3, Marketing/Q4Hierarchical Grouping (Parent-Child)
Hierarchical grouping produces summaries at the intersection of organizational levels. Grouping by country and city shows totals for each city within each country. Grouping by year, month, and day creates daily summaries nested within months and years. This pattern powers most business reporting dashboards.
-- Headcount by country and city
SELECT
country,
city,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY country, city
ORDER BY country, city;
-- Result:
-- USA | New York | 45 | 95000
-- USA | San Francisco | 32 | 110000
-- UK | London | 28 | 85000
-- UK | Manchester | 12 | 72000
-- Each row is a unique (country, city) pair
-- "USA" alone is not a group; "USA, New York" isCross-Tabulation (Pivot-Style Grouping)
Combining multi-column GROUP BY with conditional aggregation produces a cross-tabulation: one dimension on the rows, another on the columns. This is the SQL equivalent of a pivot table. The GROUP BY defines the row dimension. CASE WHEN inside aggregate functions creates the column dimension.
-- Sales by product (rows) and quarter (columns)
SELECT
product_name,
SUM(CASE WHEN quarter = 'Q1' THEN revenue ELSE 0 END) AS q1,
SUM(CASE WHEN quarter = 'Q2' THEN revenue ELSE 0 END) AS q2,
SUM(CASE WHEN quarter = 'Q3' THEN revenue ELSE 0 END) AS q3,
SUM(CASE WHEN quarter = 'Q4' THEN revenue ELSE 0 END) AS q4,
SUM(revenue) AS full_year
FROM sales
GROUP BY product_name
ORDER BY full_year DESC;
-- Two-dimension grouping with conditional aggregation
SELECT
department,
job_level,
COUNT(*) AS headcount,
COUNT(*) FILTER (WHERE gender = 'F') AS female_count,
COUNT(*) FILTER (WHERE gender = 'M') AS male_count
FROM employees
GROUP BY department, job_level;GROUP BY with HAVING on Composite Groups
HAVING filters groups after aggregation. With multi-column GROUP BY, HAVING filters based on the aggregate values of composite groups. This lets you ask questions like 'which department-quarter combinations had more than $1M in revenue' or 'which city-product pairs sold fewer than 10 units.'
-- Department-quarter pairs with high revenue
SELECT
department,
quarter,
SUM(revenue) AS total_revenue
FROM sales
GROUP BY department, quarter
HAVING SUM(revenue) > 1000000
ORDER BY total_revenue DESC;
-- City-category pairs with low sales volume
SELECT
city,
product_category,
COUNT(*) AS order_count,
SUM(amount) AS total_sales
FROM orders
JOIN products ON orders.product_id = products.product_id
GROUP BY city, product_category
HAVING COUNT(*) < 10;4 Multi-Column GROUP BY Patterns
Common Mistakes
- Selecting non-grouped, non-aggregated columns: every column in SELECT must either appear in GROUP BY or be inside an aggregate function. Adding a column to SELECT without adding it to GROUP BY causes an error in PostgreSQL and Snowflake. MySQL may silently pick an arbitrary value. Fix: add the column to GROUP BY if you want finer groups, or wrap it in MAX() or MIN() if you need a representative value.
- Over-grouping: adding unnecessary columns to GROUP BY creates more groups with fewer rows each. If you GROUP BY department, quarter, employee_name, you get per-employee groups, not per-department groups. The aggregates become meaningless because each group might contain a single row. Fix: only include columns that define the level of granularity you need. Ask yourself: what is one row in my result?
- Confusing WHERE and HAVING with multi-column groups: WHERE filters rows before grouping. HAVING filters groups after aggregation. Putting an aggregate condition in WHERE causes a syntax error. Row-level filters go in WHERE. Aggregate-level filters go in HAVING. For example: WHERE status = 'active' (row filter) and HAVING COUNT(*) > 10 (group filter).
3 GROUP BY Interview Questions
Q1: What happens when you GROUP BY multiple columns? How does it differ from grouping by a single column?
What they test: Whether you understand composite group formation. Single-column GROUP BY creates one group per unique value in that column. Multi-column GROUP BY creates one group per unique combination of values across all listed columns. The number of possible groups is the Cartesian product of distinct values (though only existing combinations appear). Approach: Explain with a concrete example: employees table with department (3 values) and job_level (4 values). GROUP BY department alone gives 3 groups. GROUP BY department, job_level gives up to 12 groups, one for each (department, job_level) pair that has at least one row. Mention that the aggregate functions (SUM, COUNT, AVG) compute within each composite group independently.
Q2: Write a query to find the top-selling product in each category for each region.
What they test: Ability to combine multi-column GROUP BY with ranking. This requires grouping by region, category, and product to get totals, then ranking within (region, category) to pick the top product. Window functions or a subquery with GROUP BY are both valid approaches. Approach: Step 1: GROUP BY region, category, product_name to get SUM(sales) per product per category per region. Step 2: Use ROW_NUMBER() OVER (PARTITION BY region, category ORDER BY total_sales DESC) to rank products within each (region, category). Step 3: Filter to rank = 1. Walk through the query step by step so the interviewer sees your reasoning.
Q3: A colleague wrote a query with GROUP BY department but selected employee_name in the output. The query runs in MySQL but fails in PostgreSQL. Why?
What they test: Understanding of the GROUP BY contract: every non-aggregated column in SELECT must appear in the GROUP BY clause (or be functionally dependent on it). MySQL with ONLY_FULL_GROUP_BY disabled allows this and picks an arbitrary value. PostgreSQL enforces the SQL standard strictly. Approach: Explain the rule: SELECT can only contain columns that are in the GROUP BY list or inside aggregate functions. employee_name is not in GROUP BY and not aggregated, so PostgreSQL rejects it. MySQL (without ONLY_FULL_GROUP_BY) picks an arbitrary employee_name from the group, which is unpredictable and usually wrong. The fix: add employee_name to GROUP BY if you want per-employee groups, or use an aggregate like MAX(employee_name) if you need a representative value.
Frequently asked questions
How does GROUP BY multiple columns work in SQL?+
Does the order of columns in GROUP BY matter?+
Can I GROUP BY a column that is not in the SELECT list?+
1 in 22 interview rounds. Worth the reps.
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition