# Top Products per Category

> Five winners per category.

Canonical URL: <https://datadriven.io/problems/top_products_per_category>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For each product category, pull the top 5 products by total sales revenue. The data lives across the transactions and products tables. Return the category, product name, and total sales.

## Worked solution and explanation

### Why this problem exists in real interviews

Querying `transactions` for top products per category requires per-group ranking via `ROW_NUMBER()` or `DENSE_RANK()` partitioned by a grouping key. Interviewers watch for whether the candidate aggregates first or tries to rank raw rows, which is the most common mistake.

---

### Break down the requirements

#### Step 1: Join the tables

Join `transactions` to `products` on the shared key to combine the data needed for the query.

#### Step 2: Aggregate per user_id

`GROUP BY user_id` with the appropriate aggregate function produces one summary row per group from the `transactions` table.

#### Step 3: Rank within each transaction_id

Use `ROW_NUMBER() OVER (PARTITION BY transaction_id ORDER BY aggregate DESC)` to rank entries within each partition.

#### Step 4: Filter to top entries

Wrap in a subquery and filter `WHERE rn <= N` to keep only the top entries per group.

---

### The solution

**Sum revenue per product-category pair then rank top 5 within each category**

```sql
SELECT transaction_id, user_id, total_total_amount
FROM (
    SELECT
        transaction_id,
        user_id,
        SUM(total_amount) AS total_total_amount,
        ROW_NUMBER() OVER (
            PARTITION BY transaction_id
            ORDER BY SUM(total_amount) DESC
        ) AS rn
    FROM transactions
    GROUP BY transaction_id, user_id
) ranked
WHERE rn <= 10
ORDER BY transaction_id, total_total_amount DESC
```

> **Cost Analysis**
>
> The GROUP BY reduces the 120M-row `transactions` table to the number of distinct `user_id` values. The window function sorts within each partition. A covering index on `(user_id, total_amount)` enables an index-only aggregate scan.

> **Interviewers Watch For**
>
> Interviewers specifically test whether you use `PARTITION BY` in the window function. Omitting it gives a global ranking instead of per-group, which is at its core different.

> **Common Pitfall**
>
> Using `ORDER BY ... LIMIT` instead of a window function for per-group ranking. LIMIT gives N rows globally, not per group. Per-group top-N always requires a window function.

---

## Common follow-up questions

- If a product belongs to multiple categories in the products table, does it appear in multiple category rankings? _(Tests schema assumption; typically product_id maps to one category, but the candidate should verify.)_
- Should you use ROW_NUMBER or DENSE_RANK if the prompt does not mention tie handling? _(Tests default ranking choice; ROW_NUMBER gives exactly 5 per category, DENSE_RANK may give more.)_
- How does the query change if 'total sales revenue' means SUM(quantity * unit_price) instead of SUM(total_amount)? _(Tests metric flexibility; swapping the aggregate expression without changing the query structure.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_products_per_category)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.