SQL GROUPING SETS, ROLLUP, and CUBE

Here's a secret that took me years to appreciate. The first time a staff engineer showed me GROUPING SETS, I'd already been writing SQL for five years and I had no idea it existed. You'll feel that same 'oh' moment today. If you've ever UNION ALL'd three copies of the same query to build a report with subtotals, you're going to love this.

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
3
Tools to learn together
1
Table scan per query
32%
SQL rounds use GROUP BY
632
Senior rounds analyzed

GROUPING SETS: Pick Your Groupings

GROUPING SETS lets you specify exactly which column combinations to aggregate. Each grouping set is a parenthesized list of columns. The empty set () produces the grand total. The engine computes all groupings in a single scan of the data.
Prepare for the interview
01 / Open invite
02min.

Know GROUPING SETS the way the interviewer who asks it knows it.

a GROUPING SETS query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1SELECT user_id,
2 COUNT(*) AS sessions
3FROM events
4WHERE ts >= NOW() - INTERVAL '7 day'
5
Execute your solution0.4s avg.
MetaInterview question
Solve a GROUPING SETS problem
Interview note: If an interviewer asks for subtotals at multiple levels, start by writing GROUPING SETS. It shows you know the right tool. If they have not seen it before, explain that it is standard SQL (SQL:1999) and works on PostgreSQL, SQL Server, Oracle, Snowflake, BigQuery, Redshift, and Databricks.
-- The UNION ALL equivalent (3 scans, more code)
SELECT region, NULL AS product, SUM(revenue)
FROM sales GROUP BY region

UNION ALL

SELECT NULL, product, SUM(revenue)
FROM sales GROUP BY product

UNION ALL

SELECT NULL, NULL, SUM(revenue)
FROM sales;

Equivalent UNION ALL

To understand what GROUPING SETS produces, here is the long-form version. Notice how much more code it requires and how it scans the table three times instead of once.
-- Revenue with hierarchical subtotals
SELECT
  year,
  quarter,
  month,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY ROLLUP (year, quarter, month);

-- This generates these groupings:
-- (year, quarter, month)  -> monthly detail
-- (year, quarter)         -> quarterly subtotal
-- (year)                  -> yearly subtotal
-- ()                      -> grand total

ROLLUP: Hierarchical Subtotals

ROLLUP is a shorthand that generates a hierarchical set of groupings. It removes columns from right to left. ROLLUP(a, b, c) produces: (a, b, c), (a, b), (a), and (). This matches hierarchical dimensions perfectly. Year, quarter, month. Country, state, city. Department, team, employee.
-- Subtotals per region, but always grouped by region
SELECT
  region,
  product_category,
  product,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY region, ROLLUP (product_category, product);
-- Produces per region:
--   (region, product_category, product) -> detail
--   (region, product_category)          -> category subtotal
--   (region)                            -> region subtotal
ROLLUP generates N+1 groupings for N columns. Three columns gives you four levels: detail, two levels of subtotals, and the grand total. The column order matters. ROLLUP(year, quarter) gives you year subtotals. ROLLUP(quarter, year) gives you quarter subtotals. Put the broadest dimension first.
-- Full cross-tab: every combination of region and product
SELECT
  region,
  product,
  SUM(revenue) AS total_revenue,
  COUNT(*) AS num_transactions
FROM sales
GROUP BY CUBE (region, product);

-- This generates these groupings:
-- (region, product)  -> detail per region + product
-- (region)           -> subtotal per region
-- (product)          -> subtotal per product
-- ()                 -> grand total

Partial ROLLUP

You can combine regular GROUP BY columns with ROLLUP to get subtotals for some dimensions but not others.
Prepare for the interview
03 / From the bank03 of many
03hand-picked.

Campaign Revenue by Click Channel

Medium20 min

Which ad format drives the most revenue?

CUBE: Every Possible Combination

CUBE generates all possible grouping combinations. For N columns, it produces 2^N groupings. CUBE(a, b) gives: (a, b), (a), (b), and (). CUBE(a, b, c) gives 8 groupings. This is useful for cross-tab reports where you need aggregates along every dimension. Be careful with CUBE on many columns. Four columns produce 16 groupings. Six columns produce 64. Use CUBE for reporting summaries with 2 to 4 dimensions. For more dimensions, use GROUPING SETS to specify only the combinations you actually need.
SELECT
  CASE
    WHEN GROUPING(region) = 1 THEN 'All Regions'
    ELSE region
  END AS region_label,
  CASE
    WHEN GROUPING(product) = 1 THEN 'All Products'
    ELSE product
  END AS product_label,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY CUBE (region, product)
ORDER BY
  GROUPING(region),
  GROUPING(product),
  region,
  product;

ROLLUP vs CUBE Comparison

-- Revenue by region, by product, and grand total
SELECT
  region,
  product,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY GROUPING SETS (
  (region),           -- subtotal per region
  (product),          -- subtotal per product
  ()                  -- grand total
);

The GROUPING() Function

When a column appears as NULL in GROUPING SETS output, you need to know: is it NULL because the data is NULL, or because this row is a subtotal where that column was rolled up? The GROUPING() function answers this. GROUPING(column_name) returns 1 if the column is aggregated (subtotal row) and 0 if it is part of the current grouping (detail row).
-- PostgreSQL: multi-column GROUPING returns a bitmask
SELECT
  region,
  product,
  SUM(revenue) AS total_revenue,
  GROUPING(region, product) AS grp_level
FROM sales
GROUP BY CUBE (region, product);
-- grp_level 0 = detail, 1 = product rolled up,
-- 2 = region rolled up, 3 = grand total
The CASE expressions replace the ambiguous NULLs with readable labels. The ORDER BY uses GROUPING() values to sort subtotals after their detail rows, which is the standard pattern for formatted reports. Interview note: Mentioning GROUPING() shows you have built actual reports with these features. Most candidates who know ROLLUP syntax do not know about GROUPING(). Bring it up to differentiate yourself.

Sales Dashboard Summary Table

This pattern materializes a summary table that a BI tool can query at any aggregation level without recomputing.

CREATE TABLE sales_summary AS
SELECT
  COALESCE(region, 'ALL') AS region,
  COALESCE(category, 'ALL') AS category,
  EXTRACT(YEAR FROM sale_date) AS year,
  SUM(amount) AS total_amount,
  COUNT(*) AS num_sales,
  AVG(amount) AS avg_amount,
  GROUPING(region, category) AS grp_level
FROM sales
GROUP BY
  GROUPING SETS (
    (region, category, EXTRACT(YEAR FROM sale_date)),
    (region, EXTRACT(YEAR FROM sale_date)),
    (category, EXTRACT(YEAR FROM sale_date)),
    (EXTRACT(YEAR FROM sale_date))
  );

Monthly Retention with Subtotals

ROLLUP on cohort month and activity month gives you per-cohort retention detail plus cohort-level subtotals and a grand total.

SELECT
  cohort_month,
  activity_month,
  COUNT(DISTINCT user_id) AS active_users
FROM user_activity
GROUP BY ROLLUP (cohort_month, activity_month)
ORDER BY cohort_month NULLS LAST, activity_month NULLS LAST;

GROUPING_ID() for Bitmask Identification

Some engines support GROUPING_ID() (SQL Server) or GROUPING(col1, col2, ...) (PostgreSQL) which returns a bitmask identifying the current grouping level. Each bit corresponds to one column. This is useful for filtering to specific grouping levels in downstream processing.

3 GROUPING SETS Interview Questions

Q1: Write a query that shows revenue by region, revenue by product, and total revenue, all in one result set.

What they test: Whether you know GROUPING SETS as an alternative to UNION ALL. Most candidates write three queries. The clean answer uses GROUPING SETS. Approach: GROUP BY GROUPING SETS ((region), (product), ()). Use GROUPING() in CASE expressions to label subtotal rows. Mention it scans the table once.

Q2: Explain the difference between ROLLUP and CUBE. Give an example of when you would use each.

What they test: Conceptual clarity. They want to hear 'ROLLUP is hierarchical, CUBE is all combinations' and a concrete use case for each. Approach: ROLLUP(year, quarter, month) for time-based subtotals. CUBE(region, product) for cross-dimensional analysis. State the grouping counts: N+1 vs 2^N. Warn about CUBE on many columns.

Q3: In a ROLLUP query, how do you distinguish between a NULL that means 'subtotal' and a NULL in the actual data?

What they test: Whether you know the GROUPING() function. This separates candidates who have used ROLLUP in production from those who have only read about it. Approach: GROUPING(column) returns 1 for subtotal NULLs and 0 for real grouping values. Use it in CASE expressions or in WHERE to filter to specific aggregation levels. Mention GROUPING_ID() for multi-column bitmask identification.

Practical Reporting Examples

FeatureROLLUP(a, b)CUBE(a, b)
Groupings(a,b), (a), ()(a,b), (a), (b), ()
CountN+12^N
Best forHierarchical dimensions (time, geography)Cross-tab, pivot-style analysis
Missing grouping(b) alone is not generatedEvery combination appears

Engine Support

MySQL is the notable gap: it supports ROLLUP with the WITH ROLLUP syntax but does not support GROUPING SETS or CUBE. If you need full GROUPING SETS on MySQL, use UNION ALL as a fallback.
EngineGROUPING SETSROLLUPCUBE
PostgreSQL 9.5+YesYesYes
SQL Server 2008+YesYesYes
MySQL 8.0+NoYes (WITH ROLLUP)No
BigQueryYesYesYes
SnowflakeYesYesYes
OracleYesYesYes

Frequently asked questions

What are GROUPING SETS in SQL?+
GROUPING SETS is a GROUP BY extension that lets you compute multiple groupings in a single query. Instead of running three separate GROUP BY queries and combining them with UNION ALL, you list the groupings you want inside GROUPING SETS ((...), (...), ()). Each entry is a set of columns to group by. The empty set () produces the grand total. The engine calculates all specified groupings in one pass over the data, which is faster and easier to maintain than the UNION ALL equivalent.
What is the difference between ROLLUP and CUBE?+
ROLLUP produces a hierarchical set of groupings from left to right. ROLLUP(a, b, c) generates groupings (a, b, c), (a, b), (a), and (). It is designed for hierarchical dimensions like year > quarter > month. CUBE produces every possible combination of the listed columns. CUBE(a, b, c) generates 8 groupings: all combinations of including or excluding each column. CUBE is useful when you want aggregates across every dimension combination, like a full cross-tab report.
What does the GROUPING() function do?+
GROUPING() takes a column name and returns 1 if that column is aggregated (rolled up) in the current row, or 0 if it is part of the current grouping. This lets you distinguish between a NULL that means 'this column was rolled up into a subtotal' and a NULL that means 'the actual data value is NULL'. GROUPING() is essential for labeling subtotal and grand total rows in reports. You can also use GROUPING_ID() to get a bitmask of all GROUPING() values for multiple columns at once.
02 / Why practice

You're closer to fluent than you think

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

More reading