SQL GROUP BY Multiple Columns Guide

GROUP BY appears in 32% of verified DE interview SQL questions. It's the second most common construct in the corpus, behind SELECT/FROM. Roughly 14% of those GROUP BY questions use a composite key of two or more columns, which puts multi-column grouping at about 1 in every 22 SQL rounds.

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
32%
Questions use GROUP BY
14%
Use composite grouping
2nd
Most common clause
429
SQL rounds analyzed

How Composite Groups Form

The 14% of SQL questions that use composite GROUP BY share one mechanic. SQL walks every row, builds a group key from the column tuple, and buckets matching keys together. A row with ('Sales', 'Q1') and another with ('Sales', 'Q2') land in separate groups even though the first column matches. Two columns with 5 distinct values each can produce up to 25 groups, but in practice most real problems return 6 to 12 groups after filtering.
Prepare for the interview
01 / Open invite
02min.

Know GROUP BY the way the interviewer who asks it knows it.

a GROUP BY query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1SELECT user_id,
2 COUNT(*) AS sessions
3FROM events
4WHERE ts >= NOW() - INTERVAL '7 day'
5
Execute your solution0.4s avg.
VisaInterview question
Solve a GROUP BY problem
-- Source data (5 rows)
-- department | quarter | revenue
-- Sales      | Q1      | 100
-- Sales      | Q1      | 200
-- Sales      | Q2      | 150
-- Marketing  | Q1      | 80
-- Marketing  | Q2      | 120

SELECT department, quarter, SUM(revenue) AS total
FROM sales
GROUP BY department, quarter;

-- Result (4 groups):
-- Sales      | Q1 | 300   (2 rows combined)
-- Sales      | Q2 | 150   (1 row)
-- Marketing  | Q1 | 80    (1 row)
-- Marketing  | Q2 | 120   (1 row)
Key insight: The number of groups is bounded by the number of distinct value combinations that exist in the data, not the theoretical maximum. If Marketing never has Q3 sales, there is no Marketing/Q3 group.
Prepare for the interview
03 / From the bank03 of many
03hand-picked.

The Notification That Paid Off

Hard34 min

The message went out to thousands. A smaller number actually bit.

Basic Multi-Column GROUP BY

When you GROUP BY two or more columns, SQL creates one group for each unique combination of values across those columns. A table with 3 departments and 4 job titles could produce up to 12 groups (3 x 4), though only combinations that exist in the data create actual groups. Each group gets one row in the result.

-- Revenue by department and quarter
SELECT
  department,
  quarter,
  SUM(revenue) AS total_revenue,
  COUNT(*) AS deal_count
FROM sales
GROUP BY department, quarter
ORDER BY department, quarter;

-- If department has values (Sales, Marketing)
-- and quarter has values (Q1, Q2, Q3, Q4)
-- up to 8 groups are possible:
-- Sales/Q1, Sales/Q2, Sales/Q3, Sales/Q4
-- Marketing/Q1, Marketing/Q2, Marketing/Q3, Marketing/Q4

Hierarchical Grouping (Parent-Child)

Hierarchical grouping produces summaries at the intersection of organizational levels. Grouping by country and city shows totals for each city within each country. Grouping by year, month, and day creates daily summaries nested within months and years. This pattern powers most business reporting dashboards.

-- Headcount by country and city
SELECT
  country,
  city,
  COUNT(*) AS employee_count,
  AVG(salary) AS avg_salary
FROM employees
GROUP BY country, city
ORDER BY country, city;

-- Result:
-- USA  | New York      | 45 | 95000
-- USA  | San Francisco | 32 | 110000
-- UK   | London        | 28 | 85000
-- UK   | Manchester    | 12 | 72000

-- Each row is a unique (country, city) pair
-- "USA" alone is not a group; "USA, New York" is

Cross-Tabulation (Pivot-Style Grouping)

Combining multi-column GROUP BY with conditional aggregation produces a cross-tabulation: one dimension on the rows, another on the columns. This is the SQL equivalent of a pivot table. The GROUP BY defines the row dimension. CASE WHEN inside aggregate functions creates the column dimension.

-- Sales by product (rows) and quarter (columns)
SELECT
  product_name,
  SUM(CASE WHEN quarter = 'Q1' THEN revenue ELSE 0 END) AS q1,
  SUM(CASE WHEN quarter = 'Q2' THEN revenue ELSE 0 END) AS q2,
  SUM(CASE WHEN quarter = 'Q3' THEN revenue ELSE 0 END) AS q3,
  SUM(CASE WHEN quarter = 'Q4' THEN revenue ELSE 0 END) AS q4,
  SUM(revenue) AS full_year
FROM sales
GROUP BY product_name
ORDER BY full_year DESC;

-- Two-dimension grouping with conditional aggregation
SELECT
  department,
  job_level,
  COUNT(*) AS headcount,
  COUNT(*) FILTER (WHERE gender = 'F') AS female_count,
  COUNT(*) FILTER (WHERE gender = 'M') AS male_count
FROM employees
GROUP BY department, job_level;

GROUP BY with HAVING on Composite Groups

HAVING filters groups after aggregation. With multi-column GROUP BY, HAVING filters based on the aggregate values of composite groups. This lets you ask questions like 'which department-quarter combinations had more than $1M in revenue' or 'which city-product pairs sold fewer than 10 units.'

-- Department-quarter pairs with high revenue
SELECT
  department,
  quarter,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY department, quarter
HAVING SUM(revenue) > 1000000
ORDER BY total_revenue DESC;

-- City-category pairs with low sales volume
SELECT
  city,
  product_category,
  COUNT(*) AS order_count,
  SUM(amount) AS total_sales
FROM orders
JOIN products ON orders.product_id = products.product_id
GROUP BY city, product_category
HAVING COUNT(*) < 10;

4 Multi-Column GROUP BY Patterns

From basic composite grouping to hierarchical analysis, cross-tabulation, and HAVING filters on composite groups.

Common Mistakes

  • Selecting non-grouped, non-aggregated columns: every column in SELECT must either appear in GROUP BY or be inside an aggregate function. Adding a column to SELECT without adding it to GROUP BY causes an error in PostgreSQL and Snowflake. MySQL may silently pick an arbitrary value. Fix: add the column to GROUP BY if you want finer groups, or wrap it in MAX() or MIN() if you need a representative value.
  • Over-grouping: adding unnecessary columns to GROUP BY creates more groups with fewer rows each. If you GROUP BY department, quarter, employee_name, you get per-employee groups, not per-department groups. The aggregates become meaningless because each group might contain a single row. Fix: only include columns that define the level of granularity you need. Ask yourself: what is one row in my result?
  • Confusing WHERE and HAVING with multi-column groups: WHERE filters rows before grouping. HAVING filters groups after aggregation. Putting an aggregate condition in WHERE causes a syntax error. Row-level filters go in WHERE. Aggregate-level filters go in HAVING. For example: WHERE status = 'active' (row filter) and HAVING COUNT(*) > 10 (group filter).

3 GROUP BY Interview Questions

Q1: What happens when you GROUP BY multiple columns? How does it differ from grouping by a single column?

What they test: Whether you understand composite group formation. Single-column GROUP BY creates one group per unique value in that column. Multi-column GROUP BY creates one group per unique combination of values across all listed columns. The number of possible groups is the Cartesian product of distinct values (though only existing combinations appear). Approach: Explain with a concrete example: employees table with department (3 values) and job_level (4 values). GROUP BY department alone gives 3 groups. GROUP BY department, job_level gives up to 12 groups, one for each (department, job_level) pair that has at least one row. Mention that the aggregate functions (SUM, COUNT, AVG) compute within each composite group independently.

Q2: Write a query to find the top-selling product in each category for each region.

What they test: Ability to combine multi-column GROUP BY with ranking. This requires grouping by region, category, and product to get totals, then ranking within (region, category) to pick the top product. Window functions or a subquery with GROUP BY are both valid approaches. Approach: Step 1: GROUP BY region, category, product_name to get SUM(sales) per product per category per region. Step 2: Use ROW_NUMBER() OVER (PARTITION BY region, category ORDER BY total_sales DESC) to rank products within each (region, category). Step 3: Filter to rank = 1. Walk through the query step by step so the interviewer sees your reasoning.

Q3: A colleague wrote a query with GROUP BY department but selected employee_name in the output. The query runs in MySQL but fails in PostgreSQL. Why?

What they test: Understanding of the GROUP BY contract: every non-aggregated column in SELECT must appear in the GROUP BY clause (or be functionally dependent on it). MySQL with ONLY_FULL_GROUP_BY disabled allows this and picks an arbitrary value. PostgreSQL enforces the SQL standard strictly. Approach: Explain the rule: SELECT can only contain columns that are in the GROUP BY list or inside aggregate functions. employee_name is not in GROUP BY and not aggregated, so PostgreSQL rejects it. MySQL (without ONLY_FULL_GROUP_BY) picks an arbitrary employee_name from the group, which is unpredictable and usually wrong. The fix: add employee_name to GROUP BY if you want per-employee groups, or use an aggregate like MAX(employee_name) if you need a representative value.

Frequently asked questions

How does GROUP BY multiple columns work in SQL?+
GROUP BY with multiple columns creates one group for each unique combination of values across all the listed columns. For example, GROUP BY department, year creates a separate group for each (department, year) pair. Aggregate functions like SUM, COUNT, and AVG are computed independently within each composite group. Only combinations that actually exist in the data produce groups in the result.
Does the order of columns in GROUP BY matter?+
The order of columns in the GROUP BY clause does not affect which groups are created or the aggregate values. GROUP BY a, b produces the same groups as GROUP BY b, a. However, the order can affect the result set ordering in some databases (though you should always use an explicit ORDER BY). The column order also affects readability: list the primary dimension first.
Can I GROUP BY a column that is not in the SELECT list?+
Yes. You can GROUP BY a column without selecting it. The grouping still happens, but the grouping column is not displayed in the output. This is valid SQL and sometimes useful when you only need aggregate values without showing the grouping dimensions. However, it can make results confusing to read since rows may look identical but represent different groups.
02 / Why practice

1 in 22 interview rounds. Worth the reps.

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

More reading