SQL GROUPING SETS, ROLLUP, and CUBE
Here's a secret that took me years to appreciate. The first time a staff engineer showed me GROUPING SETS, I'd already been writing SQL for five years and I had no idea it existed. You'll feel that same 'oh' moment today. If you've ever UNION ALL'd three copies of the same query to build a report with subtotals, you're going to love this.
GROUPING SETS: Pick Your Groupings
Know GROUPING SETS the way the interviewer who asks it knows it.
-- The UNION ALL equivalent (3 scans, more code)
SELECT region, NULL AS product, SUM(revenue)
FROM sales GROUP BY region
UNION ALL
SELECT NULL, product, SUM(revenue)
FROM sales GROUP BY product
UNION ALL
SELECT NULL, NULL, SUM(revenue)
FROM sales;Equivalent UNION ALL
-- Revenue with hierarchical subtotals
SELECT
year,
quarter,
month,
SUM(revenue) AS total_revenue
FROM sales
GROUP BY ROLLUP (year, quarter, month);
-- This generates these groupings:
-- (year, quarter, month) -> monthly detail
-- (year, quarter) -> quarterly subtotal
-- (year) -> yearly subtotal
-- () -> grand totalROLLUP: Hierarchical Subtotals
-- Subtotals per region, but always grouped by region
SELECT
region,
product_category,
product,
SUM(revenue) AS total_revenue
FROM sales
GROUP BY region, ROLLUP (product_category, product);
-- Produces per region:
-- (region, product_category, product) -> detail
-- (region, product_category) -> category subtotal
-- (region) -> region subtotal-- Full cross-tab: every combination of region and product
SELECT
region,
product,
SUM(revenue) AS total_revenue,
COUNT(*) AS num_transactions
FROM sales
GROUP BY CUBE (region, product);
-- This generates these groupings:
-- (region, product) -> detail per region + product
-- (region) -> subtotal per region
-- (product) -> subtotal per product
-- () -> grand totalPartial ROLLUP
Campaign Revenue by Click Channel
Which ad format drives the most revenue?
Pulled from debriefs where SQL was the gate.
CUBE: Every Possible Combination
SELECT
CASE
WHEN GROUPING(region) = 1 THEN 'All Regions'
ELSE region
END AS region_label,
CASE
WHEN GROUPING(product) = 1 THEN 'All Products'
ELSE product
END AS product_label,
SUM(revenue) AS total_revenue
FROM sales
GROUP BY CUBE (region, product)
ORDER BY
GROUPING(region),
GROUPING(product),
region,
product;ROLLUP vs CUBE Comparison
-- Revenue by region, by product, and grand total
SELECT
region,
product,
SUM(revenue) AS total_revenue
FROM sales
GROUP BY GROUPING SETS (
(region), -- subtotal per region
(product), -- subtotal per product
() -- grand total
);The GROUPING() Function
-- PostgreSQL: multi-column GROUPING returns a bitmask
SELECT
region,
product,
SUM(revenue) AS total_revenue,
GROUPING(region, product) AS grp_level
FROM sales
GROUP BY CUBE (region, product);
-- grp_level 0 = detail, 1 = product rolled up,
-- 2 = region rolled up, 3 = grand totalSales Dashboard Summary Table
This pattern materializes a summary table that a BI tool can query at any aggregation level without recomputing.
CREATE TABLE sales_summary AS
SELECT
COALESCE(region, 'ALL') AS region,
COALESCE(category, 'ALL') AS category,
EXTRACT(YEAR FROM sale_date) AS year,
SUM(amount) AS total_amount,
COUNT(*) AS num_sales,
AVG(amount) AS avg_amount,
GROUPING(region, category) AS grp_level
FROM sales
GROUP BY
GROUPING SETS (
(region, category, EXTRACT(YEAR FROM sale_date)),
(region, EXTRACT(YEAR FROM sale_date)),
(category, EXTRACT(YEAR FROM sale_date)),
(EXTRACT(YEAR FROM sale_date))
);Monthly Retention with Subtotals
ROLLUP on cohort month and activity month gives you per-cohort retention detail plus cohort-level subtotals and a grand total.
SELECT
cohort_month,
activity_month,
COUNT(DISTINCT user_id) AS active_users
FROM user_activity
GROUP BY ROLLUP (cohort_month, activity_month)
ORDER BY cohort_month NULLS LAST, activity_month NULLS LAST;GROUPING_ID() for Bitmask Identification
3 GROUPING SETS Interview Questions
Q1: Write a query that shows revenue by region, revenue by product, and total revenue, all in one result set.
What they test: Whether you know GROUPING SETS as an alternative to UNION ALL. Most candidates write three queries. The clean answer uses GROUPING SETS. Approach: GROUP BY GROUPING SETS ((region), (product), ()). Use GROUPING() in CASE expressions to label subtotal rows. Mention it scans the table once.
Q2: Explain the difference between ROLLUP and CUBE. Give an example of when you would use each.
What they test: Conceptual clarity. They want to hear 'ROLLUP is hierarchical, CUBE is all combinations' and a concrete use case for each. Approach: ROLLUP(year, quarter, month) for time-based subtotals. CUBE(region, product) for cross-dimensional analysis. State the grouping counts: N+1 vs 2^N. Warn about CUBE on many columns.
Q3: In a ROLLUP query, how do you distinguish between a NULL that means 'subtotal' and a NULL in the actual data?
What they test: Whether you know the GROUPING() function. This separates candidates who have used ROLLUP in production from those who have only read about it. Approach: GROUPING(column) returns 1 for subtotal NULLs and 0 for real grouping values. Use it in CASE expressions or in WHERE to filter to specific aggregation levels. Mention GROUPING_ID() for multi-column bitmask identification.
Practical Reporting Examples
| Feature | ROLLUP(a, b) | CUBE(a, b) |
|---|---|---|
| Groupings | (a,b), (a), () | (a,b), (a), (b), () |
| Count | N+1 | 2^N |
| Best for | Hierarchical dimensions (time, geography) | Cross-tab, pivot-style analysis |
| Missing grouping | (b) alone is not generated | Every combination appears |
Engine Support
| Engine | GROUPING SETS | ROLLUP | CUBE |
|---|---|---|---|
| PostgreSQL 9.5+ | Yes | Yes | Yes |
| SQL Server 2008+ | Yes | Yes | Yes |
| MySQL 8.0+ | No | Yes (WITH ROLLUP) | No |
| BigQuery | Yes | Yes | Yes |
| Snowflake | Yes | Yes | Yes |
| Oracle | Yes | Yes | Yes |
Frequently asked questions
What are GROUPING SETS in SQL?+
What is the difference between ROLLUP and CUBE?+
What does the GROUPING() function do?+
You're closer to fluent than you think
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition