SQL Reference

SQL GROUP BY Multiple Columns

GROUP BY appears in 32% of verified DE interview SQL questions. It's the second most common construct in the corpus, behind SELECT/FROM. Roughly 14% of those GROUP BY questions use a composite key of two or more columns, which puts multi-column grouping at about 1 in every 22 SQL rounds.

The higher you go in seniority, the more often it shows up. At L5 (61% of the corpus at 632 rounds), composite GROUP BY is usually paired with a HAVING filter. At L6, it's almost always the setup for a window function or a ranking follow-up. Four patterns cover nearly every shape you'll see: hierarchical rollups, cross-tabs, HAVING on composites, and GROUP BY chained into a second aggregation.

32%

Questions use GROUP BY

14%

Use composite grouping

2nd

Most common clause

429

SQL rounds analyzed

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

How Composite Groups Form

The 14% of SQL questions that use composite GROUP BY share one mechanic. SQL walks every row, builds a group key from the column tuple, and buckets matching keys together. A row with ('Sales', 'Q1') and another with ('Sales', 'Q2') land in separate groups even though the first column matches. Two columns with 5 distinct values each can produce up to 25 groups, but in practice the corpus shows most real problems return 6 to 12 groups after filtering.

-- Source data (5 rows)
-- department | quarter | revenue
-- Sales      | Q1      | 100
-- Sales      | Q1      | 200
-- Sales      | Q2      | 150
-- Marketing  | Q1      | 80
-- Marketing  | Q2      | 120

SELECT department, quarter, SUM(revenue) AS total
FROM sales
GROUP BY department, quarter;

-- Result (4 groups):
-- Sales      | Q1 | 300   (2 rows combined)
-- Sales      | Q2 | 150   (1 row)
-- Marketing  | Q1 | 80    (1 row)
-- Marketing  | Q2 | 120   (1 row)

Key insight: The number of groups is bounded by the number of distinct value combinations that exist in the data, not the theoretical maximum. If Marketing never has Q3 sales, there is no Marketing/Q3 group.

4 Multi-Column GROUP BY Patterns

From basic composite grouping to hierarchical analysis, cross-tabulation, and HAVING filters on composite groups.

Basic Multi-Column GROUP BY

When you GROUP BY two or more columns, SQL creates one group for each unique combination of values across those columns. A table with 3 departments and 4 job titles could produce up to 12 groups (3 x 4), though only combinations that exist in the data create actual groups. Each group gets one row in the result.

-- Revenue by department and quarter
SELECT
  department,
  quarter,
  SUM(revenue) AS total_revenue,
  COUNT(*) AS deal_count
FROM sales
GROUP BY department, quarter
ORDER BY department, quarter;

-- If department has values (Sales, Marketing)
-- and quarter has values (Q1, Q2, Q3, Q4)
-- up to 8 groups are possible:
-- Sales/Q1, Sales/Q2, Sales/Q3, Sales/Q4
-- Marketing/Q1, Marketing/Q2, Marketing/Q3, Marketing/Q4

Hierarchical Grouping (Parent-Child)

Hierarchical grouping produces summaries at the intersection of organizational levels. Grouping by country and city shows totals for each city within each country. Grouping by year, month, and day creates daily summaries nested within months and years. This pattern powers most business reporting dashboards.

-- Headcount by country and city
SELECT
  country,
  city,
  COUNT(*) AS employee_count,
  AVG(salary) AS avg_salary
FROM employees
GROUP BY country, city
ORDER BY country, city;

-- Result:
-- USA  | New York      | 45 | 95000
-- USA  | San Francisco | 32 | 110000
-- UK   | London        | 28 | 85000
-- UK   | Manchester    | 12 | 72000

-- Each row is a unique (country, city) pair
-- "USA" alone is not a group; "USA, New York" is

Cross-Tabulation (Pivot-Style Grouping)

Combining multi-column GROUP BY with conditional aggregation produces a cross-tabulation: one dimension on the rows, another on the columns. This is the SQL equivalent of a pivot table. The GROUP BY defines the row dimension. CASE WHEN inside aggregate functions creates the column dimension.

-- Sales by product (rows) and quarter (columns)
SELECT
  product_name,
  SUM(CASE WHEN quarter = 'Q1' THEN revenue ELSE 0 END) AS q1,
  SUM(CASE WHEN quarter = 'Q2' THEN revenue ELSE 0 END) AS q2,
  SUM(CASE WHEN quarter = 'Q3' THEN revenue ELSE 0 END) AS q3,
  SUM(CASE WHEN quarter = 'Q4' THEN revenue ELSE 0 END) AS q4,
  SUM(revenue) AS full_year
FROM sales
GROUP BY product_name
ORDER BY full_year DESC;

-- Two-dimension grouping with conditional aggregation
SELECT
  department,
  job_level,
  COUNT(*) AS headcount,
  COUNT(*) FILTER (WHERE gender = 'F') AS female_count,
  COUNT(*) FILTER (WHERE gender = 'M') AS male_count
FROM employees
GROUP BY department, job_level;

GROUP BY with HAVING on Composite Groups

HAVING filters groups after aggregation. With multi-column GROUP BY, HAVING filters based on the aggregate values of composite groups. This lets you ask questions like 'which department-quarter combinations had more than $1M in revenue' or 'which city-product pairs sold fewer than 10 units.'

-- Department-quarter pairs with high revenue
SELECT
  department,
  quarter,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY department, quarter
HAVING SUM(revenue) > 1000000
ORDER BY total_revenue DESC;

-- City-category pairs with low sales volume
SELECT
  city,
  product_category,
  COUNT(*) AS order_count,
  SUM(amount) AS total_sales
FROM orders
JOIN products ON orders.product_id = products.product_id
GROUP BY city, product_category
HAVING COUNT(*) < 10;

Common Mistakes

Multi-column GROUP BY introduces specific pitfalls that single-column grouping does not.

Selecting non-grouped, non-aggregated columns

Every column in SELECT must either appear in GROUP BY or be inside an aggregate function. Adding a column to SELECT without adding it to GROUP BY causes an error in PostgreSQL and Snowflake. MySQL may silently pick an arbitrary value.

Fix: Add the column to GROUP BY if you want finer groups, or wrap it in an aggregate like MAX() or MIN() if you need a representative value.

Over-grouping: too many columns fragment the data

Adding unnecessary columns to GROUP BY creates more groups with fewer rows each. If you GROUP BY department, quarter, employee_name, you get per-employee groups, not per-department groups. The aggregates become meaningless because each group might contain a single row.

Fix: Only include columns in GROUP BY that define the level of granularity you need. Ask yourself: what is one row in my result? That answer defines your GROUP BY columns.

Confusing WHERE and HAVING with multi-column groups

WHERE filters rows before grouping. HAVING filters groups after aggregation. Putting an aggregate condition in WHERE causes a syntax error. Putting a row-level condition in HAVING is valid but less efficient because it filters after the expensive grouping operation.

Fix: Row-level filters go in WHERE. Aggregate-level filters go in HAVING. For example: WHERE status = 'active' (row filter) and HAVING COUNT(*) > 10 (group filter).

3 GROUP BY Interview Questions

These questions test composite group understanding, practical multi-dimension analysis, and SQL standard compliance.

Q1: What happens when you GROUP BY multiple columns? How does it differ from grouping by a single column?

What they test:

Whether you understand composite group formation. Single-column GROUP BY creates one group per unique value in that column. Multi-column GROUP BY creates one group per unique combination of values across all listed columns. The number of possible groups is the Cartesian product of distinct values (though only existing combinations appear).

Approach:

Explain with a concrete example: employees table with department (3 values) and job_level (4 values). GROUP BY department alone gives 3 groups. GROUP BY department, job_level gives up to 12 groups, one for each (department, job_level) pair that has at least one row. Mention that the aggregate functions (SUM, COUNT, AVG) compute within each composite group independently.

Q2: Write a query to find the top-selling product in each category for each region.

What they test:

Ability to combine multi-column GROUP BY with ranking. This requires grouping by region, category, and product to get totals, then ranking within (region, category) to pick the top product. Window functions or a subquery with GROUP BY are both valid approaches.

Approach:

Step 1: GROUP BY region, category, product_name to get SUM(sales) per product per category per region. Step 2: Use ROW_NUMBER() OVER (PARTITION BY region, category ORDER BY total_sales DESC) to rank products within each (region, category). Step 3: Filter to rank = 1. Walk through the query step by step so the interviewer sees your reasoning.

Q3: A colleague wrote a query with GROUP BY department but selected employee_name in the output. The query runs in MySQL but fails in PostgreSQL. Why?

What they test:

Understanding of the GROUP BY contract: every non-aggregated column in SELECT must appear in the GROUP BY clause (or be functionally dependent on it). MySQL with ONLY_FULL_GROUP_BY disabled allows this and picks an arbitrary value. PostgreSQL enforces the SQL standard strictly.

Approach:

Explain the rule: SELECT can only contain columns that are in the GROUP BY list or inside aggregate functions. employee_name is not in GROUP BY and not aggregated, so PostgreSQL rejects it. MySQL (without ONLY_FULL_GROUP_BY) picks an arbitrary employee_name from the group, which is unpredictable and usually wrong. The fix: add employee_name to GROUP BY if you want per-employee groups, or use an aggregate like MAX(employee_name) if you need a representative value.

GROUP BY Multiple Columns FAQ

How does GROUP BY multiple columns work in SQL?+
GROUP BY with multiple columns creates one group for each unique combination of values across all the listed columns. For example, GROUP BY department, year creates a separate group for each (department, year) pair. Aggregate functions like SUM, COUNT, and AVG are computed independently within each composite group. Only combinations that actually exist in the data produce groups in the result.
Does the order of columns in GROUP BY matter?+
The order of columns in the GROUP BY clause does not affect which groups are created or the aggregate values. GROUP BY a, b produces the same groups as GROUP BY b, a. However, the order can affect the result set ordering in some databases (though you should always use an explicit ORDER BY). The column order also affects readability: list the primary dimension first.
Can I GROUP BY a column that is not in the SELECT list?+
Yes. You can GROUP BY a column without selecting it. The grouping still happens, but the grouping column is not displayed in the output. This is valid SQL and sometimes useful when you only need aggregate values without showing the grouping dimensions. However, it can make results confusing to read since rows may look identical but represent different groups.

1 in 22 interview rounds. Worth the reps.

Roughly 60 out of the 429 verified SQL rounds we analyzed needed a composite GROUP BY to answer. Practice the four patterns against a real engine and you'll recognize them cold.