# Top Product Categories by Sales

> The highest-grossing categories.

Canonical URL: <https://datadriven.io/problems/top_product_categories_by_sales>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

For each product category, total up the transaction amounts and rank sorted from highest revenue to lowest.

## Worked solution and explanation

### What this is really asking

`category` lives on `products` (20k rows), revenue lives on `transactions` (80M rows). Join on `product_id`, SUM `total_amount` per category, ORDER BY revenue DESC. The join direction matters at this row count.

---

### Break down the requirements

#### Step 1: Get category onto each transaction

INNER JOIN `transactions` to `products` on `product_id`. A transaction with no matching product is corruption, not a category.

#### Step 2: Aggregate by category

`SUM(t.total_amount)` grouped by `p.category`. Use `total_amount` directly; promotions and refunds already settled into it.

#### Step 3: Sort descending

`ORDER BY total_revenue DESC` ranks all categories. No LIMIT.

---

**REVENUE BY CATEGORY**

```sql
SELECT
  p.category,
  SUM(t.total_amount) AS total_revenue
FROM products p
INNER JOIN transactions t
  ON p.product_id = t.product_id
GROUP BY p.category
ORDER BY total_revenue DESC
```

> **Cost Analysis**
>
> 80M-row fact against a 20k-row dim is a textbook hash-build on `products`. Aggregation collapses to one row per category before sort, so ORDER BY is free.

> **Interviewers Watch For**
>
> INNER over LEFT (categories with zero sales should not appear), summing `total_amount` not `quantity * price`, grouping by the dim attribute not the dim key.

> **Common Pitfall**
>
> `SUM(t.quantity * p.price)` ignores discounts and refunds baked into the fact. The two numbers diverge on any catalog with promotions.

> **The False Start**
>
> First instinct is `GROUP BY p.product_id, p.category` so the key tags along. That returns one row per product, and ORDER BY ranks products. Pivot to `GROUP BY p.category` only.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you include categories with zero sales? _(LEFT JOIN with `products` on the left, wrap SUM in `COALESCE(..., 0)`.)_
- What if a product changes category over time? _(Add SCD2 on `products` and join on a time-valid surrogate; current schema only carries latest.)_
- How would you add a percent-of-total revenue column? _(Wrap in a CTE, divide each row by `SUM(total_revenue) OVER ()`.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_product_categories_by_sales)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.