# Extreme Category Totals

> The highest and the lowest. Both are interesting.

Canonical URL: <https://datadriven.io/problems/extreme_category_totals>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

From this calendar year's transactions, find the product categories with the highest total sales and the lowest total sales. Combine transactions with products to get the category. Return both category names and their totals.

## Worked solution and explanation

### Why this problem exists in real interviews

Finance asks for top and bottom in the same query, and the interviewer is probing whether you build the per-category aggregate once in a CTE or naively run it twice. The other signal: do you flag that `INNER JOIN` hides categories with zero sales, which is exactly the population the MIN side might want to surface?

---

### Break down the requirements

#### Step 1: Join transactions to products

`transactions` has `product_id` but not `category`. Inner join on `product_id` to attach category. Products is 30K rows, the build side of the hash.

#### Step 2: Filter to this calendar year

Push the date predicate before the GROUP BY. `strftime('%Y', transaction_date) = strftime('%Y', 'now')` reads correctly but kills partition pruning. A half-open range on `transaction_date` is the production version.

#### Step 3: Aggregate once into a CTE

`WITH yearly AS (SELECT category, SUM(total_amount) AS total ... GROUP BY category)`. Materialize the per-category total so you only scan the 120M row fact once.

#### Step 4: Filter to the two extremes

Outer SELECT: `WHERE total = (SELECT MAX(total) FROM yearly) OR total = (SELECT MIN(total) FROM yearly)`. Two scalar subqueries on a tiny CTE, basically free.

---

### The solution

**TOP AND BOTTOM CATEGORY**

```sql
WITH yearly AS (
  SELECT p.category, SUM(t.total_amount) AS total
  FROM transactions t
  JOIN products p ON t.product_id = p.product_id
  WHERE strftime('%Y', t.transaction_date) = CAST(CAST(strftime('%Y', 'now') AS INTEGER) - 0 AS TEXT)
  GROUP BY p.category
)
SELECT category, total
FROM yearly
WHERE total = (SELECT MAX(total) FROM yearly)
   OR total = (SELECT MIN(total) FROM yearly)
```

> **Cost Analysis**
>
> Transactions is 120M rows partitioned by `transaction_date`. The `strftime` filter is not sargable and forces a full scan of every partition. Rewrite as `transaction_date >= DATE('now', 'start of year') AND transaction_date < DATE('now', 'start of year', '+1 year')` to prune to one year of partitions. Hash join on 30K products fits in memory. The two scalar subqueries on the CTE are negligible.

> **Interviewers Watch For**
>
> Ask aloud: how should ties be handled? The MAX/MIN filter returns every tied category, which is usually right. If they want exactly two rows, that is `ROW_NUMBER` over `ORDER BY total DESC` and `ASC`, picked arbitrarily on ties. Decide before writing.

> **Common Pitfall**
>
> Inner join hides categories with zero sales this year. If the interviewer wants those surfaced as the MIN at total=0, switch to `products LEFT JOIN transactions` with the date filter moved into the ON clause and wrap the sum in `COALESCE(SUM(total_amount), 0)`.

**ORDER BY + LIMIT (wrong)**

`(SELECT ... ORDER BY total DESC LIMIT 1) UNION ALL (SELECT ... ORDER BY total ASC LIMIT 1)`. Runs the aggregate twice and drops ties.

**CTE + scalar subqueries (right)**

Build `yearly` once. Two scalar probes against MAX and MIN keep all tied rows and read the fact table only once.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- Rewrite the year filter to be sargable on the partition key. _(Tests whether you know `strftime` blocks partition pruning and that a half-open range is the production fix.)_
- Return categories with zero sales this year alongside the existing MAX and MIN. _(Forces a LEFT JOIN from products to transactions with `COALESCE` around the SUM.)_
- How does the plan change if you replace the two scalar subqueries with a single `MIN/MAX` window? _(Probes window functions: `MIN(total) OVER ()` and `MAX(total) OVER ()` versus correlated scalar subqueries on a small CTE.)_
- What if the question asked for top three and bottom three? _(Pushes you toward `RANK` or `ROW_NUMBER` partitioned by a synthetic direction column, and a discussion of how to handle ties at the boundary.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/extreme_category_totals)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.