Here's a secret that took me years to appreciate. The first time a staff engineer showed me GROUPING SETS, I'd already been writing SQL for five years and I had no idea it existed. You'll feel that same "oh" moment today. If you've ever UNION ALL'd three copies of the same query to build a report with subtotals, you're going to love this.
Don't worry if it feels abstract at first. Most people meet GROUPING SETS, ROLLUP, and CUBE out of order, understand none of them, and then one Tuesday afternoon it clicks. We'll walk through each one the way I wish someone had walked through them with me: the pattern, the mental shortcut, and the one interview question that turns the concept real.
Tools to learn together
Table scan per query
SQL rounds use GROUP BY
Senior rounds analyzed
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
GROUPING SETS lets you specify exactly which column combinations to aggregate. Each grouping set is a parenthesized list of columns. The empty set () produces the grand total. The engine computes all groupings in a single scan of the data.
-- Revenue by region, by product, and grand total
SELECT
region,
product,
SUM(revenue) AS total_revenue
FROM sales
GROUP BY GROUPING SETS (
(region), -- subtotal per region
(product), -- subtotal per product
() -- grand total
);Read that query slowly. You're asking the engine for three different groupings in one pass, and you get rows back for each one, each row labeled by NULL where a dimension was rolled up. The first time you see the output, you'll want to sort it so the NULL patterns make sense. That instinct is fine. Most teams add an ORDER BY with a GROUPING() helper once they're building real dashboards, which we'll cover in a minute.
To understand what GROUPING SETS produces, here is the long-form version. Notice how much more code it requires and how it scans the table three times instead of once.
-- The UNION ALL equivalent (3 scans, more code)
SELECT region, NULL AS product, SUM(revenue)
FROM sales GROUP BY region
UNION ALL
SELECT NULL, product, SUM(revenue)
FROM sales GROUP BY product
UNION ALL
SELECT NULL, NULL, SUM(revenue)
FROM sales;Interview note: If an interviewer asks for subtotals at multiple levels, start by writing GROUPING SETS. It shows you know the right tool. If they have not seen it before, explain that it is standard SQL (SQL:1999) and works on PostgreSQL, SQL Server, Oracle, Snowflake, BigQuery, Redshift, and Databricks.
ROLLUP is a shorthand that generates a hierarchical set of groupings. It removes columns from right to left. ROLLUP(a, b, c) produces: (a, b, c), (a, b), (a), and (). This matches hierarchical dimensions perfectly. Year, quarter, month. Country, state, city. Department, team, employee.
-- Revenue with hierarchical subtotals
SELECT
year,
quarter,
month,
SUM(revenue) AS total_revenue
FROM sales
GROUP BY ROLLUP (year, quarter, month);
-- This generates these groupings:
-- (year, quarter, month) -> monthly detail
-- (year, quarter) -> quarterly subtotal
-- (year) -> yearly subtotal
-- () -> grand totalROLLUP generates N+1 groupings for N columns. Three columns gives you four levels: detail, two levels of subtotals, and the grand total. The column order matters. ROLLUP(year, quarter) gives you year subtotals. ROLLUP(quarter, year) gives you quarter subtotals. Put the broadest dimension first.
You can combine regular GROUP BY columns with ROLLUP to get subtotals for some dimensions but not others.
-- Subtotals per region, but always grouped by region
SELECT
region,
product_category,
product,
SUM(revenue) AS total_revenue
FROM sales
GROUP BY region, ROLLUP (product_category, product);
-- Produces per region:
-- (region, product_category, product) -> detail
-- (region, product_category) -> category subtotal
-- (region) -> region subtotalCUBE generates all possible grouping combinations. For N columns, it produces 2^N groupings. CUBE(a, b) gives: (a, b), (a), (b), and (). CUBE(a, b, c) gives 8 groupings. This is useful for cross-tab reports where you need aggregates along every dimension.
-- Full cross-tab: every combination of region and product
SELECT
region,
product,
SUM(revenue) AS total_revenue,
COUNT(*) AS num_transactions
FROM sales
GROUP BY CUBE (region, product);
-- This generates these groupings:
-- (region, product) -> detail per region + product
-- (region) -> subtotal per region
-- (product) -> subtotal per product
-- () -> grand totalBe careful with CUBE on many columns. Four columns produce 16 groupings. Six columns produce 64. The result set grows fast, and so does the computation. Use CUBE for reporting summaries with 2 to 4 dimensions. For more dimensions, use GROUPING SETS to specify only the combinations you actually need.
| Feature | ROLLUP(a, b) | CUBE(a, b) |
|---|---|---|
| Groupings | (a,b), (a), () | (a,b), (a), (b), () |
| Count | N+1 | 2^N |
| Best for | Hierarchical dimensions (time, geography) | Cross-tab, pivot-style analysis |
| Missing grouping | (b) alone is not generated | Every combination appears |
When a column appears as NULL in GROUPING SETS output, you need to know: is it NULL because the data is NULL, or because this row is a subtotal where that column was rolled up? The GROUPING() function answers this. GROUPING(column_name) returns 1 if the column is aggregated (subtotal row) and 0 if it is part of the current grouping (detail row).
SELECT
CASE
WHEN GROUPING(region) = 1 THEN 'All Regions'
ELSE region
END AS region_label,
CASE
WHEN GROUPING(product) = 1 THEN 'All Products'
ELSE product
END AS product_label,
SUM(revenue) AS total_revenue
FROM sales
GROUP BY CUBE (region, product)
ORDER BY
GROUPING(region),
GROUPING(product),
region,
product;The CASE expressions replace the ambiguous NULLs with readable labels. The ORDER BY uses GROUPING() values to sort subtotals after their detail rows, which is the standard pattern for formatted reports.
Some engines support GROUPING_ID() (SQL Server) or GROUPING(col1, col2, ...) (PostgreSQL) which returns a bitmask identifying the current grouping level. Each bit corresponds to one column. This is useful for filtering to specific grouping levels in downstream processing.
-- PostgreSQL: multi-column GROUPING returns a bitmask
SELECT
region,
product,
SUM(revenue) AS total_revenue,
GROUPING(region, product) AS grp_level
FROM sales
GROUP BY CUBE (region, product);
-- grp_level 0 = detail, 1 = product rolled up,
-- 2 = region rolled up, 3 = grand totalInterview note: Mentioning GROUPING() shows you have built actual reports with these features. Most candidates who know ROLLUP syntax do not know about GROUPING(). Bring it up to differentiate yourself.
This pattern materializes a summary table that a BI tool can query at any aggregation level without recomputing.
CREATE TABLE sales_summary AS
SELECT
COALESCE(region, 'ALL') AS region,
COALESCE(category, 'ALL') AS category,
EXTRACT(YEAR FROM sale_date) AS year,
SUM(amount) AS total_amount,
COUNT(*) AS num_sales,
AVG(amount) AS avg_amount,
GROUPING(region, category) AS grp_level
FROM sales
GROUP BY
GROUPING SETS (
(region, category, EXTRACT(YEAR FROM sale_date)),
(region, EXTRACT(YEAR FROM sale_date)),
(category, EXTRACT(YEAR FROM sale_date)),
(EXTRACT(YEAR FROM sale_date))
);ROLLUP on cohort month and activity month gives you per-cohort retention detail plus cohort-level subtotals and a grand total.
SELECT
cohort_month,
activity_month,
COUNT(DISTINCT user_id) AS active_users
FROM user_activity
GROUP BY ROLLUP (cohort_month, activity_month)
ORDER BY cohort_month NULLS LAST, activity_month NULLS LAST;These questions test whether you know multi-level aggregation beyond basic GROUP BY. They appear in interviews at companies that build reporting pipelines and summary tables.
What they test:
Whether you know GROUPING SETS as an alternative to UNION ALL. Most candidates write three queries. The clean answer uses GROUPING SETS.
Approach:
GROUP BY GROUPING SETS ((region), (product), ()). Use GROUPING() in CASE expressions to label subtotal rows. Mention it scans the table once.
What they test:
Conceptual clarity. They want to hear “ROLLUP is hierarchical, CUBE is all combinations” and a concrete use case for each.
Approach:
ROLLUP(year, quarter, month) for time-based subtotals. CUBE(region, product) for cross-dimensional analysis. State the grouping counts: N+1 vs 2^N. Warn about CUBE on many columns.
What they test:
Whether you know the GROUPING() function. This separates candidates who have used ROLLUP in production from those who have only read about it.
Approach:
GROUPING(column) returns 1 for subtotal NULLs and 0 for real grouping values. Use it in CASE expressions or in WHERE to filter to specific aggregation levels. Mention GROUPING_ID() for multi-column bitmask identification.
| Engine | GROUPING SETS | ROLLUP | CUBE |
|---|---|---|---|
| PostgreSQL 9.5+ | Yes | Yes | Yes |
| SQL Server 2008+ | Yes | Yes | Yes |
| MySQL 8.0+ | No | Yes (WITH ROLLUP) | No |
| BigQuery | Yes | Yes | Yes |
| Snowflake | Yes | Yes | Yes |
| Oracle | Yes | Yes | Yes |
MySQL is the notable gap: it supports ROLLUP with the WITH ROLLUP syntax but does not support GROUPING SETS or CUBE. If you need full GROUPING SETS on MySQL, use UNION ALL as a fallback.
Spend a weekend on GROUPING SETS, ROLLUP, and CUBE and you'll recognize them in the wild forever. A dozen problems is usually enough for the pattern to stick.