SQL Reference

SQL GROUPING SETS, ROLLUP, and CUBE

Here's a secret that took me years to appreciate. The first time a staff engineer showed me GROUPING SETS, I'd already been writing SQL for five years and I had no idea it existed. You'll feel that same "oh" moment today. If you've ever UNION ALL'd three copies of the same query to build a report with subtotals, you're going to love this.

Don't worry if it feels abstract at first. Most people meet GROUPING SETS, ROLLUP, and CUBE out of order, understand none of them, and then one Tuesday afternoon it clicks. We'll walk through each one the way I wish someone had walked through them with me: the pattern, the mental shortcut, and the one interview question that turns the concept real.

3

Tools to learn together

1

Table scan per query

32%

SQL rounds use GROUP BY

632

Senior rounds analyzed

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

GROUPING SETS: Pick Your Groupings

GROUPING SETS lets you specify exactly which column combinations to aggregate. Each grouping set is a parenthesized list of columns. The empty set () produces the grand total. The engine computes all groupings in a single scan of the data.

-- Revenue by region, by product, and grand total
SELECT
  region,
  product,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY GROUPING SETS (
  (region),           -- subtotal per region
  (product),          -- subtotal per product
  ()                  -- grand total
);

Read that query slowly. You're asking the engine for three different groupings in one pass, and you get rows back for each one, each row labeled by NULL where a dimension was rolled up. The first time you see the output, you'll want to sort it so the NULL patterns make sense. That instinct is fine. Most teams add an ORDER BY with a GROUPING() helper once they're building real dashboards, which we'll cover in a minute.

Equivalent UNION ALL

To understand what GROUPING SETS produces, here is the long-form version. Notice how much more code it requires and how it scans the table three times instead of once.

-- The UNION ALL equivalent (3 scans, more code)
SELECT region, NULL AS product, SUM(revenue)
FROM sales GROUP BY region

UNION ALL

SELECT NULL, product, SUM(revenue)
FROM sales GROUP BY product

UNION ALL

SELECT NULL, NULL, SUM(revenue)
FROM sales;

Interview note: If an interviewer asks for subtotals at multiple levels, start by writing GROUPING SETS. It shows you know the right tool. If they have not seen it before, explain that it is standard SQL (SQL:1999) and works on PostgreSQL, SQL Server, Oracle, Snowflake, BigQuery, Redshift, and Databricks.

ROLLUP: Hierarchical Subtotals

ROLLUP is a shorthand that generates a hierarchical set of groupings. It removes columns from right to left. ROLLUP(a, b, c) produces: (a, b, c), (a, b), (a), and (). This matches hierarchical dimensions perfectly. Year, quarter, month. Country, state, city. Department, team, employee.

-- Revenue with hierarchical subtotals
SELECT
  year,
  quarter,
  month,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY ROLLUP (year, quarter, month);

-- This generates these groupings:
-- (year, quarter, month)  -> monthly detail
-- (year, quarter)         -> quarterly subtotal
-- (year)                  -> yearly subtotal
-- ()                      -> grand total

ROLLUP generates N+1 groupings for N columns. Three columns gives you four levels: detail, two levels of subtotals, and the grand total. The column order matters. ROLLUP(year, quarter) gives you year subtotals. ROLLUP(quarter, year) gives you quarter subtotals. Put the broadest dimension first.

Partial ROLLUP

You can combine regular GROUP BY columns with ROLLUP to get subtotals for some dimensions but not others.

-- Subtotals per region, but always grouped by region
SELECT
  region,
  product_category,
  product,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY region, ROLLUP (product_category, product);
-- Produces per region:
--   (region, product_category, product) -> detail
--   (region, product_category)          -> category subtotal
--   (region)                            -> region subtotal

CUBE: Every Possible Combination

CUBE generates all possible grouping combinations. For N columns, it produces 2^N groupings. CUBE(a, b) gives: (a, b), (a), (b), and (). CUBE(a, b, c) gives 8 groupings. This is useful for cross-tab reports where you need aggregates along every dimension.

-- Full cross-tab: every combination of region and product
SELECT
  region,
  product,
  SUM(revenue) AS total_revenue,
  COUNT(*) AS num_transactions
FROM sales
GROUP BY CUBE (region, product);

-- This generates these groupings:
-- (region, product)  -> detail per region + product
-- (region)           -> subtotal per region
-- (product)          -> subtotal per product
-- ()                 -> grand total

Be careful with CUBE on many columns. Four columns produce 16 groupings. Six columns produce 64. The result set grows fast, and so does the computation. Use CUBE for reporting summaries with 2 to 4 dimensions. For more dimensions, use GROUPING SETS to specify only the combinations you actually need.

ROLLUP vs. CUBE Comparison

FeatureROLLUP(a, b)CUBE(a, b)
Groupings(a,b), (a), ()(a,b), (a), (b), ()
CountN+12^N
Best forHierarchical dimensions (time, geography)Cross-tab, pivot-style analysis
Missing grouping(b) alone is not generatedEvery combination appears

The GROUPING() Function

When a column appears as NULL in GROUPING SETS output, you need to know: is it NULL because the data is NULL, or because this row is a subtotal where that column was rolled up? The GROUPING() function answers this. GROUPING(column_name) returns 1 if the column is aggregated (subtotal row) and 0 if it is part of the current grouping (detail row).

SELECT
  CASE
    WHEN GROUPING(region) = 1 THEN 'All Regions'
    ELSE region
  END AS region_label,
  CASE
    WHEN GROUPING(product) = 1 THEN 'All Products'
    ELSE product
  END AS product_label,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY CUBE (region, product)
ORDER BY
  GROUPING(region),
  GROUPING(product),
  region,
  product;

The CASE expressions replace the ambiguous NULLs with readable labels. The ORDER BY uses GROUPING() values to sort subtotals after their detail rows, which is the standard pattern for formatted reports.

GROUPING_ID() for Bitmask Identification

Some engines support GROUPING_ID() (SQL Server) or GROUPING(col1, col2, ...) (PostgreSQL) which returns a bitmask identifying the current grouping level. Each bit corresponds to one column. This is useful for filtering to specific grouping levels in downstream processing.

-- PostgreSQL: multi-column GROUPING returns a bitmask
SELECT
  region,
  product,
  SUM(revenue) AS total_revenue,
  GROUPING(region, product) AS grp_level
FROM sales
GROUP BY CUBE (region, product);
-- grp_level 0 = detail, 1 = product rolled up,
-- 2 = region rolled up, 3 = grand total

Interview note: Mentioning GROUPING() shows you have built actual reports with these features. Most candidates who know ROLLUP syntax do not know about GROUPING(). Bring it up to differentiate yourself.

Practical Reporting Examples

Sales Dashboard Summary Table

This pattern materializes a summary table that a BI tool can query at any aggregation level without recomputing.

CREATE TABLE sales_summary AS
SELECT
  COALESCE(region, 'ALL') AS region,
  COALESCE(category, 'ALL') AS category,
  EXTRACT(YEAR FROM sale_date) AS year,
  SUM(amount) AS total_amount,
  COUNT(*) AS num_sales,
  AVG(amount) AS avg_amount,
  GROUPING(region, category) AS grp_level
FROM sales
GROUP BY
  GROUPING SETS (
    (region, category, EXTRACT(YEAR FROM sale_date)),
    (region, EXTRACT(YEAR FROM sale_date)),
    (category, EXTRACT(YEAR FROM sale_date)),
    (EXTRACT(YEAR FROM sale_date))
  );

Monthly Retention with Subtotals

ROLLUP on cohort month and activity month gives you per-cohort retention detail plus cohort-level subtotals and a grand total.

SELECT
  cohort_month,
  activity_month,
  COUNT(DISTINCT user_id) AS active_users
FROM user_activity
GROUP BY ROLLUP (cohort_month, activity_month)
ORDER BY cohort_month NULLS LAST, activity_month NULLS LAST;

3 GROUPING SETS Interview Questions

These questions test whether you know multi-level aggregation beyond basic GROUP BY. They appear in interviews at companies that build reporting pipelines and summary tables.

Q1: Write a query that shows revenue by region, revenue by product, and total revenue, all in one result set.

What they test:

Whether you know GROUPING SETS as an alternative to UNION ALL. Most candidates write three queries. The clean answer uses GROUPING SETS.

Approach:

GROUP BY GROUPING SETS ((region), (product), ()). Use GROUPING() in CASE expressions to label subtotal rows. Mention it scans the table once.

Q2: Explain the difference between ROLLUP and CUBE. Give an example of when you would use each.

What they test:

Conceptual clarity. They want to hear “ROLLUP is hierarchical, CUBE is all combinations” and a concrete use case for each.

Approach:

ROLLUP(year, quarter, month) for time-based subtotals. CUBE(region, product) for cross-dimensional analysis. State the grouping counts: N+1 vs 2^N. Warn about CUBE on many columns.

Q3: In a ROLLUP query, how do you distinguish between a NULL that means “subtotal” and a NULL in the actual data?

What they test:

Whether you know the GROUPING() function. This separates candidates who have used ROLLUP in production from those who have only read about it.

Approach:

GROUPING(column) returns 1 for subtotal NULLs and 0 for real grouping values. Use it in CASE expressions or in WHERE to filter to specific aggregation levels. Mention GROUPING_ID() for multi-column bitmask identification.

Engine Support

EngineGROUPING SETSROLLUPCUBE
PostgreSQL 9.5+YesYesYes
SQL Server 2008+YesYesYes
MySQL 8.0+NoYes (WITH ROLLUP)No
BigQueryYesYesYes
SnowflakeYesYesYes
OracleYesYesYes

MySQL is the notable gap: it supports ROLLUP with the WITH ROLLUP syntax but does not support GROUPING SETS or CUBE. If you need full GROUPING SETS on MySQL, use UNION ALL as a fallback.

GROUPING SETS FAQ

What are GROUPING SETS in SQL?+
GROUPING SETS is a GROUP BY extension that lets you compute multiple groupings in a single query. Instead of running three separate GROUP BY queries and combining them with UNION ALL, you list the groupings you want inside GROUPING SETS ((...), (...), ()). Each entry is a set of columns to group by. The empty set () produces the grand total. The engine calculates all specified groupings in one pass over the data, which is faster and easier to maintain than the UNION ALL equivalent.
What is the difference between ROLLUP and CUBE?+
ROLLUP produces a hierarchical set of groupings from left to right. ROLLUP(a, b, c) generates groupings (a, b, c), (a, b), (a), and (). It is designed for hierarchical dimensions like year > quarter > month. CUBE produces every possible combination of the listed columns. CUBE(a, b, c) generates 8 groupings: all combinations of including or excluding each column. CUBE is useful when you want aggregates across every dimension combination, like a full cross-tab report.
What does the GROUPING() function do?+
GROUPING() takes a column name and returns 1 if that column is aggregated (rolled up) in the current row, or 0 if it is part of the current grouping. This lets you distinguish between a NULL that means 'this column was rolled up into a subtotal' and a NULL that means 'the actual data value is NULL'. GROUPING() is essential for labeling subtotal and grand total rows in reports. You can also use GROUPING_ID() to get a bitmask of all GROUPING() values for multiple columns at once.

You're closer to fluent than you think

Spend a weekend on GROUPING SETS, ROLLUP, and CUBE and you'll recognize them in the wild forever. A dozen problems is usually enough for the pattern to stick.