# Top Product Categories

> Top three categories by page views.

Canonical URL: <https://datadriven.io/problems/top_product_categories>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Find the top 3 product categories by total transactions. If categories are tied at a cutoff rank, include all of them. Category info comes from the products table. Return the category, transaction count, and rank.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether you pick the right ranking function for tie semantics. 'Include all tied categories at the cutoff' rules out `LIMIT` and `ROW_NUMBER`. The interviewer wants to see you say `DENSE_RANK` out loud and explain why before you write, not pick it by reflex and discover the tie behavior at query-time.

---

### Break down the requirements

#### Step 1: Rank tiers, not rows

The prompt says include ties at the cutoff. That rules out `LIMIT 3` and `ROW_NUMBER`. Say `DENSE_RANK` out loud and explain why before you type.

#### Step 2: Aggregate before you rank

Join `transactions t` to `products p` on `product_id`, then `GROUP BY p.category` with `COUNT(*)`. Window function runs over the grouped result, not the raw fact.

#### Step 3: Filter on the rank

You can't put `DENSE_RANK()` in a `WHERE`. Wrap the aggregation in a subquery or CTE and filter `WHERE rnk <= 3` in the outer.

#### Step 4: Order the output

Final `ORDER BY rnk` so tied categories sit next to each other. Don't sort by `view_count DESC` and call it done; ties can shuffle.

---

### The solution

**TOP CATEGORIES WITH TIES**

```sql
SELECT category, txn_count, rnk
FROM (
  SELECT
    p.category,
    COUNT(*) AS txn_count,
    DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) AS rnk
  FROM transactions t
  JOIN products p ON t.product_id = p.product_id
  GROUP BY p.category
) ranked
WHERE rnk <= 3
ORDER BY rnk, category;
```

> **Cost Analysis**
>
> `transactions` is 100M rows, `products` is 30k. Planner should hash-build on `products` and probe from the fact. If the question scopes by date, push a predicate on `transaction_date` to hit partition pruning before the join. `GROUP BY category` collapses to a handful of rows, so the window scan is free.

> **Interviewers Watch For**
>
> Ask: 'If two categories tie at rank 3, do you want both, or exactly three rows?' Their answer locks in `DENSE_RANK` vs `ROW_NUMBER`. If you don't ask, you've picked one by accident.

> **Common Pitfall**
>
> Writing `WHERE DENSE_RANK() OVER (...) <= 3` in the same select. Window functions run after `WHERE`, so the parser rejects it. Wrap it. Second trap: `RANK()` returns 1,1,3 and silently skips rank 2, which is not what 'top 3 tiers' means.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you rank by total revenue instead of transaction count? _(Swaps `COUNT(*)` for `SUM(total_amount)` and checks you understand the window's `ORDER BY` follows the aggregate.)_
- What changes if a product belongs to multiple categories? _(Probes join cardinality and whether you'd dedupe transactions per category.)_
- Top 3 categories per month, with ties? _(Forces `PARTITION BY DATE_TRUNC('month', transaction_date)` inside the `DENSE_RANK` window.)_
- Why not solve this with a correlated subquery counting categories with a higher count? _(Tests whether you can articulate the O(n^2) cost versus the single-pass window.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_product_categories)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.