# The Podium Finish

> Top two products per category.

Canonical URL: <https://datadriven.io/problems/the_podium_finish>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For each category, return the top 2 products by total quantity sold. If products tie in quantity within a category, break the tie alphabetically by product name and assign consecutive ranks. Only include rows ranked 1 or 2.

## Worked solution and explanation

### Why this problem exists in real interviews

Top-N per group is one of the most common SQL patterns in product analytics: top customers per cohort, top SKUs per category, top contributors per repo. Interviewers reach for it because it forces you to combine a join, a GROUP BY aggregate, a windowed ranking with PARTITION BY, and a final filter, all in a single coherent query. Doing it cleanly tells them you understand which step belongs in which clause.

---

### Break down the requirements

#### Step 1: Sum quantity per (category, product_name)

Inner-join `transactions t` to `products p` on `product_id`, then `GROUP BY p.category, p.product_name` and compute `SUM(t.quantity) AS total_quantity`. This collapses 100M transactions down to one row per product within its category.

#### Step 2: Rank products inside each category

Wrap the aggregate in a CTE and apply `ROW_NUMBER() OVER (PARTITION BY category ORDER BY total_quantity DESC, product_name ASC)`. `PARTITION BY category` resets the rank for every category, the DESC clause puts the heaviest sellers first, and the secondary `product_name ASC` is the prompt's deterministic tiebreak so two products with identical quantities get rank 1 and 2 in alphabetical order.

#### Step 3: Keep ranks 1 and 2, then sort the output

Filter `WHERE rank <= 2` to take the top two per category, then `ORDER BY category, rank` to produce the four required output columns: `category`, `product_name`, `total_quantity`, `rank`.

---

### The solution

**Aggregate, rank within category, take top two**

```sql
WITH product_qty AS (
  SELECT p.category, p.product_name, SUM(t.quantity) AS total_quantity
  FROM transactions t INNER JOIN products p ON t.product_id = p.product_id
  GROUP BY p.category, p.product_name
),
ranked AS (
  SELECT category, product_name, total_quantity,
         ROW_NUMBER() OVER (PARTITION BY category ORDER BY total_quantity DESC, product_name ASC) AS rank
  FROM product_qty
)
SELECT category, product_name, total_quantity, rank FROM ranked WHERE rank <= 2
ORDER BY category, rank
```

> **Cost Analysis**
>
> `transactions` is 100M rows partitioned daily across 365 days; `products` is only 30K rows and lives entirely in memory. The hash join probes the small `products` build side and emits one row per transaction. The aggregate then collapses to at most 30K rows (one per product). The window function and final filter operate on that 30K-row set, with at most 2 rows per of the 22 categories surviving, so the final result is bounded by 44 rows.

> **Interviewers Watch For**
>
> Interviewers note whether you choose `ROW_NUMBER` (the prompt asks for exactly the top two with a deterministic tiebreaker) versus `RANK` or `DENSE_RANK` which would let ties balloon past two rows per category. They also watch the placement of the secondary `product_name ASC` inside the window's `ORDER BY` rather than as an outer sort, because that is what makes the rank assignment itself deterministic.

> **Common Pitfall**
>
> Using `RANK()` instead of `ROW_NUMBER()` and filtering `rank <= 2` returns three or more rows in any category where two products share the same quantity, breaking the 'top two' contract. Likewise, omitting `product_name ASC` from the window's `ORDER BY` leaves the rank assignment non-deterministic across runs even though the outer `ORDER BY category, rank` looks orderly.

---

## Common follow-up questions

- How would you change the query to keep all products tied at rank 2 instead of cutting at exactly two rows per category? _(Tests understanding of the ROW_NUMBER vs RANK vs DENSE_RANK distinction. The candidate should swap to `RANK()` (or `DENSE_RANK()`) and filter `rank <= 2`, and explain how that changes the row count.)_
- How would you compute the top two products per `(category, transaction_date)` pair to track daily winners? _(Tests whether the candidate can extend the PARTITION BY clause and bring `t.transaction_date` into the GROUP BY without breaking the join. Expands the result space from 22 categories to 22 * 365 partitions.)_
- What changes if `products.in_stock = 0` should disqualify a product from the leaderboard? _(Tests whether the candidate filters before the aggregation (in the join's WHERE) so disqualified products never accumulate quantity, versus filtering after ranking, which can leave ranks with gaps.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_podium_finish)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.