SQL GROUP BY Multiple Columns Guide

GROUP BY appears in 32% of verified DE interview SQL questions. It's the second most common construct in the corpus, behind SELECT/FROM. Roughly 14% of those GROUP BY questions use a composite key of two or more columns, which puts multi-column grouping at about 1 in every 22 SQL rounds.

32%

Questions use GROUP BY

14%

Use composite grouping

2nd

Most common clause

429

SQL rounds analyzed

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

Prepare for the interview

01 / Open invite

02min.

Know GROUP BY the way the interviewer who asks it knows it.

a GROUP BY query, the same shape a screen would give you.

The diff against expected. Where ties broke. What you missed.

sandbox

1SELECT user_id,

2 COUNT(*) AS sessions

3FROM events

4WHERE ts >= NOW() - INTERVAL '7 day'

Execute your solution0.4s avg.

VisaInterview question

Solve a GROUP BY problem

How Composite Groups Form

The 14% of SQL questions that use composite GROUP BY share one mechanic. SQL walks every row, builds a group key from the column tuple, and buckets matching keys together. A row with ('Sales', 'Q1') and another with ('Sales', 'Q2') land in separate groups even though the first column matches. Two columns with 5 distinct values each can produce up to 25 groups, but in practice most real problems return 6 to 12 groups after filtering.

-- Source data (5 rows)
-- department | fiscal_quarter | metric_value
-- Sales      | Q1             | 100
-- Sales      | Q1             | 200
-- Sales      | Q2             | 150
-- Marketing  | Q1             | 80
-- Marketing  | Q2             | 120

SELECT department, fiscal_quarter, SUM(metric_value) AS total
FROM employee_metrics
GROUP BY department, fiscal_quarter;

-- Result (4 groups):
-- Sales      | Q1 | 300   (2 rows combined)
-- Sales      | Q2 | 150   (1 row)
-- Marketing  | Q1 | 80    (1 row)
-- Marketing  | Q2 | 120   (1 row)

Key insight: The number of groups is bounded by the number of distinct value combinations that exist in the data, not the theoretical maximum. If Marketing never has Q3 sales, there is no Marketing/Q3 group.

Auth Endpoint Callers

> A security review requires tracing which users have hit authentication endpoints. Find every API call whose endpoint path contains 'auth' and show the user's ID, username, email, call ID, and endpoint, ordered by user ID.

4 Multi-Column GROUP BY Patterns

From basic composite grouping to hierarchical analysis, cross-tabulation, and HAVING filters on composite groups.

Basic Multi-Column GROUP BY

When you GROUP BY two or more columns, SQL creates one group for each unique combination of values across those columns. A table with 3 departments and 4 job titles could produce up to 12 groups (3 x 4), though only combinations that exist in the data create actual groups. Each group gets one row in the result.

-- Metric value by department and quarter
SELECT
  department,
  fiscal_quarter,
  SUM(metric_value) AS total_value,
  COUNT(*) AS metric_count
FROM employee_metrics
GROUP BY department, fiscal_quarter
ORDER BY department, fiscal_quarter;

-- If department has values (Sales, Marketing)
-- and fiscal_quarter has values (Q1, Q2, Q3, Q4)
-- up to 8 groups are possible:
-- Sales/Q1, Sales/Q2, Sales/Q3, Sales/Q4
-- Marketing/Q1, Marketing/Q2, Marketing/Q3, Marketing/Q4

Hierarchical Grouping (Parent-Child)

Hierarchical grouping produces summaries at the intersection of organizational levels. Grouping by country and city shows totals for each city within each country. Grouping by year, month, and day creates daily summaries nested within months and years. This pattern powers most business reporting dashboards.

-- Headcount by department and hire year
SELECT
  department,
  substr(hire_date, 1, 4) AS hire_year,
  COUNT(*) AS employee_count,
  AVG(salary) AS avg_salary
FROM employees
GROUP BY department, substr(hire_date, 1, 4)
ORDER BY department, hire_year;

-- Result:
-- Engineering | 2021 | 45 | 95000
-- Engineering | 2022 | 32 | 110000
-- Sales       | 2021 | 28 | 85000
-- Sales       | 2022 | 12 | 72000

-- Each row is a unique (department, hire_year) pair
-- "Engineering" alone is not a group; "Engineering, 2021" is

Cross-Tabulation (Pivot-Style Grouping)

Combining multi-column GROUP BY with conditional aggregation produces a cross-tabulation: one dimension on the rows, another on the columns. This is the SQL equivalent of a pivot table. The GROUP BY defines the row dimension. CASE WHEN inside aggregate functions creates the column dimension.

-- Sales by product (rows) and quarter (columns)
SELECT
  p.product_name,
  SUM(CASE WHEN substr(t.transaction_date, 6, 2) IN ('01','02','03') THEN t.total_amount ELSE 0 END) AS q1,
  SUM(CASE WHEN substr(t.transaction_date, 6, 2) IN ('04','05','06') THEN t.total_amount ELSE 0 END) AS q2,
  SUM(CASE WHEN substr(t.transaction_date, 6, 2) IN ('07','08','09') THEN t.total_amount ELSE 0 END) AS q3,
  SUM(CASE WHEN substr(t.transaction_date, 6, 2) IN ('10','11','12') THEN t.total_amount ELSE 0 END) AS q4,
  SUM(t.total_amount) AS full_year
FROM transactions t
JOIN products p ON p.product_id = t.product_id
GROUP BY p.product_name
ORDER BY full_year DESC;

-- Two-dimension grouping with conditional aggregation
SELECT
  department,
  fiscal_year,
  COUNT(*) AS metric_count,
  COUNT(*) FILTER (WHERE fiscal_quarter = 'Q1') AS q1_count,
  COUNT(*) FILTER (WHERE fiscal_quarter = 'Q2') AS q2_count
FROM employee_metrics
GROUP BY department, fiscal_year;

GROUP BY with HAVING on Composite Groups

HAVING filters groups after aggregation. With multi-column GROUP BY, HAVING filters based on the aggregate values of composite groups. This lets you ask questions like 'which department-quarter combinations had more than $1M in revenue' or 'which city-product pairs sold fewer than 10 units.'

-- Department-quarter pairs with high metric totals
SELECT
  department,
  fiscal_quarter,
  SUM(metric_value) AS total_value
FROM employee_metrics
GROUP BY department, fiscal_quarter
HAVING SUM(metric_value) > 1000000
ORDER BY total_value DESC;

-- Region-category pairs with low sales volume
SELECT
  o.region,
  p.category,
  COUNT(*) AS order_count,
  SUM(oi.quantity * oi.unit_price) AS total_sales
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
JOIN products p ON p.product_id = oi.product_id
GROUP BY o.region, p.category
HAVING COUNT(*) < 10;

Common Mistakes

Selecting non-grouped, non-aggregated columns: every column in SELECT must either appear in GROUP BY or be inside an aggregate function. Adding a column to SELECT without adding it to GROUP BY causes an error in PostgreSQL and Snowflake. MySQL may silently pick an arbitrary value. Fix: add the column to GROUP BY if you want finer groups, or wrap it in MAX() or MIN() if you need a representative value.
Over-grouping: adding unnecessary columns to GROUP BY creates more groups with fewer rows each. If you GROUP BY department, quarter, employee_name, you get per-employee groups, not per-department groups. The aggregates become meaningless because each group might contain a single row. Fix: only include columns that define the level of granularity you need. Ask yourself: what is one row in my result?
Confusing WHERE and HAVING with multi-column groups: WHERE filters rows before grouping. HAVING filters groups after aggregation. Putting an aggregate condition in WHERE causes a syntax error. Row-level filters go in WHERE. Aggregate-level filters go in HAVING. For example: WHERE status = 'active' (row filter) and HAVING COUNT(*) > 10 (group filter).

3 GROUP BY Interview Questions

Q1: What happens when you GROUP BY multiple columns? How does it differ from grouping by a single column?

What they test: Whether you understand composite group formation. Single-column GROUP BY creates one group per unique value in that column. Multi-column GROUP BY creates one group per unique combination of values across all listed columns. The number of possible groups is the Cartesian product of distinct values (though only existing combinations appear). Approach: Explain with a concrete example: employees table with department (3 values) and job_level (4 values). GROUP BY department alone gives 3 groups. GROUP BY department, job_level gives up to 12 groups, one for each (department, job_level) pair that has at least one row. Mention that the aggregate functions (SUM, COUNT, AVG) compute within each composite group independently.

Q2: Write a query to find the top-selling product in each category for each region.

What they test: Ability to combine multi-column GROUP BY with ranking. This requires grouping by region, category, and product to get totals, then ranking within (region, category) to pick the top product. Window functions or a subquery with GROUP BY are both valid approaches. Approach: Step 1: GROUP BY region, category, product_name to get SUM(sales) per product per category per region. Step 2: Use ROW_NUMBER() OVER (PARTITION BY region, category ORDER BY total_sales DESC) to rank products within each (region, category). Step 3: Filter to rank = 1. Walk through the query step by step so the interviewer sees your reasoning.

Q3: A colleague wrote a query with GROUP BY department but selected employee_name in the output. The query runs in MySQL but fails in PostgreSQL. Why?

What they test: Understanding of the GROUP BY contract: every non-aggregated column in SELECT must appear in the GROUP BY clause (or be functionally dependent on it). MySQL with ONLY_FULL_GROUP_BY disabled allows this and picks an arbitrary value. PostgreSQL enforces the SQL standard strictly. Approach: Explain the rule: SELECT can only contain columns that are in the GROUP BY list or inside aggregate functions. employee_name is not in GROUP BY and not aggregated, so PostgreSQL rejects it. MySQL (without ONLY_FULL_GROUP_BY) picks an arbitrary employee_name from the group, which is unpredictable and usually wrong. The fix: add employee_name to GROUP BY if you want per-employee groups, or use an aggregate like MAX(employee_name) if you need a representative value.

Frequently asked questions

How does GROUP BY multiple columns work in SQL?+

GROUP BY with multiple columns creates one group for each unique combination of values across all the listed columns. For example, GROUP BY department, year creates a separate group for each (department, year) pair. Aggregate functions like SUM, COUNT, and AVG are computed independently within each composite group. Only combinations that actually exist in the data produce groups in the result.

Does the order of columns in GROUP BY matter?+

The order of columns in the GROUP BY clause does not affect which groups are created or the aggregate values. GROUP BY a, b produces the same groups as GROUP BY b, a. However, the order can affect the result set ordering in some databases (though you should always use an explicit ORDER BY). The column order also affects readability: list the primary dimension first.

Can I GROUP BY a column that is not in the SELECT list?+

Yes. You can GROUP BY a column without selecting it. The grouping still happens, but the grouping column is not displayed in the output. This is valid SQL and sometimes useful when you only need aggregate values without showing the grouping dimensions. However, it can make results confusing to read since rows may look identical but represent different groups.

02 / Why practice

1 in 22 interview rounds. Worth the reps.

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Try a GROUP BY Problem