SQL GROUPING SETS, ROLLUP, and CUBE

Here's a secret that took me years to appreciate. The first time a staff engineer showed me GROUPING SETS, I'd already been writing SQL for five years and I had no idea it existed. You'll feel that same 'oh' moment today. If you've ever UNION ALL'd three copies of the same query to build a report with subtotals, you're going to love this.

Tools to learn together

Table scan per query

32%

SQL rounds use GROUP BY

632

Senior rounds analyzed

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

Prepare for the interview

01 / Open invite

02min.

Know GROUPING SETS the way the interviewer who asks it knows it.

a GROUPING SETS query, the same shape a screen would give you.

The diff against expected. Where ties broke. What you missed.

sandbox

1SELECT user_id,

2 COUNT(*) AS sessions

3FROM events

4WHERE ts >= NOW() - INTERVAL '7 day'

Execute your solution0.4s avg.

MetaInterview question

Solve a GROUPING SETS problem

GROUPING SETS: Pick Your Groupings

GROUPING SETS lets you specify exactly which column combinations to aggregate. Each grouping set is a parenthesized list of columns. The empty set () produces the grand total. The engine computes all groupings in a single scan of the data.

-- Profit by region, by status, and grand total
SELECT
  region,
  status,
  SUM(profit) AS total_profit
FROM orders
GROUP BY GROUPING SETS (
  (region),           -- subtotal per region
  (status),          -- subtotal per status
  ()                  -- grand total
);

Interview note: If an interviewer asks for subtotals at multiple levels, start by writing GROUPING SETS. It shows you know the right tool. If they have not seen it before, explain that it is standard SQL (SQL:1999) and works on PostgreSQL, SQL Server, Oracle, Snowflake, BigQuery, Redshift, and Databricks.

Equivalent UNION ALL

To understand what GROUPING SETS produces, here is the long-form version. Notice how much more code it requires and how it scans the table three times instead of once.

-- The UNION ALL equivalent (3 scans, more code)
SELECT region, NULL AS status, SUM(profit)
FROM orders GROUP BY region

UNION ALL

SELECT NULL, status, SUM(profit)
FROM orders GROUP BY status

UNION ALL

SELECT NULL, NULL, SUM(profit)
FROM orders;

ROLLUP: Hierarchical Subtotals

ROLLUP is a shorthand that generates a hierarchical set of groupings. It removes columns from right to left. ROLLUP(a, b, c) produces: (a, b, c), (a, b), (a), and (). This matches hierarchical dimensions perfectly. Year, quarter, month. Country, state, city. Department, team, employee.

-- Profit with hierarchical subtotals
SELECT
  year,
  quarter,
  month,
  SUM(profit) AS total_profit
FROM orders
GROUP BY ROLLUP (year, quarter, month);

-- This generates these groupings:
-- (year, quarter, month)  -> monthly detail
-- (year, quarter)         -> quarterly subtotal
-- (year)                  -> yearly subtotal
-- ()                      -> grand total

ROLLUP generates N+1 groupings for N columns. Three columns gives you four levels: detail, two levels of subtotals, and the grand total. The column order matters. ROLLUP(year, quarter) gives you year subtotals. ROLLUP(quarter, year) gives you quarter subtotals. Put the broadest dimension first.

Cache Efficiency

> The infrastructure team wants each edge location compared to the global baseline. For each edge, show its request count, its local cache-hit percentage, and the overall cache-hit percentage across all edges repeated on every row for easy comparison.

Partial ROLLUP

You can combine regular GROUP BY columns with ROLLUP to get subtotals for some dimensions but not others.

-- Subtotals per region, but always grouped by region
SELECT
  region,
  product_category,
  status,
  SUM(profit) AS total_profit
FROM orders
GROUP BY region, ROLLUP (product_category, status);
-- Produces per region:
--   (region, product_category, status) -> detail
--   (region, product_category)          -> status subtotal
--   (region)                            -> region subtotal

CUBE: Every Possible Combination

CUBE generates all possible grouping combinations. For N columns, it produces 2^N groupings. CUBE(a, b) gives: (a, b), (a), (b), and (). CUBE(a, b, c) gives 8 groupings. This is useful for cross-tab reports where you need aggregates along every dimension. Be careful with CUBE on many columns. Four columns produce 16 groupings. Six columns produce 64. Use CUBE for reporting summaries with 2 to 4 dimensions. For more dimensions, use GROUPING SETS to specify only the combinations you actually need.

-- Full cross-tab: every combination of region and status
SELECT
  region,
  status,
  SUM(profit) AS total_profit,
  COUNT(*) AS num_transactions
FROM orders
GROUP BY CUBE (region, status);

-- This generates these groupings:
-- (region, status)  -> detail per region + status
-- (region)           -> subtotal per region
-- (status)          -> subtotal per status
-- ()                 -> grand total

ROLLUP vs CUBE Comparison

Feature	ROLLUP(a, b)	CUBE(a, b)
Groupings	(a,b), (a), ()	(a,b), (a), (b), ()
Count	N+1	2^N
Best for	Hierarchical dimensions (time, geography)	Cross-tab, pivot-style analysis
Missing grouping	(b) alone is not generated	Every combination appears

The GROUPING() Function

When a column appears as NULL in GROUPING SETS output, you need to know: is it NULL because the data is NULL, or because this row is a subtotal where that column was rolled up? The GROUPING() function answers this. GROUPING(column_name) returns 1 if the column is aggregated (subtotal row) and 0 if it is part of the current grouping (detail row).

SELECT
  CASE
    WHEN GROUPING(region) = 1 THEN 'All Regions'
    ELSE region
  END AS region_label,
  CASE
    WHEN GROUPING(status) = 1 THEN 'All Products'
    ELSE status
  END AS product_label,
  SUM(profit) AS total_profit
FROM orders
GROUP BY CUBE (region, status)
ORDER BY
  GROUPING(region),
  GROUPING(status),
  region,
  status;

The CASE expressions replace the ambiguous NULLs with readable labels. The ORDER BY uses GROUPING() values to sort subtotals after their detail rows, which is the standard pattern for formatted reports. Interview note: Mentioning GROUPING() shows you have built actual reports with these features. Most candidates who know ROLLUP syntax do not know about GROUPING(). Bring it up to differentiate yourself.

GROUPING_ID() for Bitmask Identification

Some engines support GROUPING_ID() (SQL Server) or GROUPING(col1, col2, ...) (PostgreSQL) which returns a bitmask identifying the current grouping level. Each bit corresponds to one column. This is useful for filtering to specific grouping levels in downstream processing.

-- PostgreSQL: multi-column GROUPING returns a bitmask
SELECT
  region,
  status,
  SUM(profit) AS total_profit,
  GROUPING(region, status) AS grp_level
FROM orders
GROUP BY CUBE (region, status);
-- grp_level 0 = detail, 1 = status rolled up,
-- 2 = region rolled up, 3 = grand total

Practical Reporting Examples

Sales Dashboard Summary Table

This pattern materializes a summary table that a BI tool can query at any aggregation level without recomputing.

SELECT
  COALESCE(region, 'ALL') AS region,
  COALESCE(status, 'ALL') AS status,
  SUM(profit) AS total_profit,
  COUNT(*) AS num_orders,
  AVG(profit) AS avg_profit,
  GROUPING(region, status) AS grp_level
FROM orders
GROUP BY
  GROUPING SETS (
    (region, status),
    (region),
    (status),
    ()
  );

Monthly Retention with Subtotals

ROLLUP on cohort month and activity month gives you per-cohort retention detail plus cohort-level subtotals and a grand total.

SELECT
  substr(transaction_date, 1, 7) AS txn_month,
  product_id,
  COUNT(DISTINCT user_id) AS active_users
FROM transactions
GROUP BY ROLLUP (substr(transaction_date, 1, 7), product_id)
ORDER BY txn_month, product_id;

3 GROUPING SETS Interview Questions

Q1: Write a query that shows profit by region, profit by status, and total profit, all in one result set.

What they test: Whether you know GROUPING SETS as an alternative to UNION ALL. Most candidates write three queries. The clean answer uses GROUPING SETS. Approach: GROUP BY GROUPING SETS ((region), (status), ()). Use GROUPING() in CASE expressions to label subtotal rows. Mention it scans the table once.

Q2: Explain the difference between ROLLUP and CUBE. Give an example of when you would use each.

What they test: Conceptual clarity. They want to hear 'ROLLUP is hierarchical, CUBE is all combinations' and a concrete use case for each. Approach: ROLLUP(year, quarter, month) for time-based subtotals. CUBE(region, status) for cross-dimensional analysis. State the grouping counts: N+1 vs 2^N. Warn about CUBE on many columns.

Q3: In a ROLLUP query, how do you distinguish between a NULL that means 'subtotal' and a NULL in the actual data?

What they test: Whether you know the GROUPING() function. This separates candidates who have used ROLLUP in production from those who have only read about it. Approach: GROUPING(column) returns 1 for subtotal NULLs and 0 for real grouping values. Use it in CASE expressions or in WHERE to filter to specific aggregation levels. Mention GROUPING_ID() for multi-column bitmask identification.

Engine Support

MySQL is the notable gap: it supports ROLLUP with the WITH ROLLUP syntax but does not support GROUPING SETS or CUBE. If you need full GROUPING SETS on MySQL, use UNION ALL as a fallback.

Engine	GROUPING SETS	ROLLUP	CUBE
PostgreSQL 9.5+	Yes	Yes	Yes
SQL Server 2008+	Yes	Yes	Yes
MySQL 8.0+	No	Yes (WITH ROLLUP)	No
BigQuery	Yes	Yes	Yes
Snowflake	Yes	Yes	Yes
Oracle	Yes	Yes	Yes

Frequently asked questions

What are GROUPING SETS in SQL?+

GROUPING SETS is a GROUP BY extension that lets you compute multiple groupings in a single query. Instead of running three separate GROUP BY queries and combining them with UNION ALL, you list the groupings you want inside GROUPING SETS ((...), (...), ()). Each entry is a set of columns to group by. The empty set () produces the grand total. The engine calculates all specified groupings in one pass over the data, which is faster and easier to maintain than the UNION ALL equivalent.

What is the difference between ROLLUP and CUBE?+

ROLLUP produces a hierarchical set of groupings from left to right. ROLLUP(a, b, c) generates groupings (a, b, c), (a, b), (a), and (). It is designed for hierarchical dimensions like year > quarter > month. CUBE produces every possible combination of the listed columns. CUBE(a, b, c) generates 8 groupings: all combinations of including or excluding each column. CUBE is useful when you want aggregates across every dimension combination, like a full cross-tab report.

What does the GROUPING() function do?+

GROUPING() takes a column name and returns 1 if that column is aggregated (rolled up) in the current row, or 0 if it is part of the current grouping. This lets you distinguish between a NULL that means 'this column was rolled up into a subtotal' and a NULL that means 'the actual data value is NULL'. GROUPING() is essential for labeling subtotal and grand total rows in reports. You can also use GROUPING_ID() to get a bitmask of all GROUPING() values for multiple columns at once.

02 / Why practice

You're closer to fluent than you think

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Start Practicing