# The Ones That Move

> Every aisle has its champions. Surface the three that carry each one.

Canonical URL: <https://datadriven.io/problems/the-ones-that-move>

Domain: SQL · Difficulty: medium · Seniority: mid

## Problem

The merchandising team wants to know which products carry each category by sales, where a product's revenue is its quantity times unit price summed across every order line. For each category return its three strongest products, showing the category, the product name, that revenue, and the product's standing within the category, ordered by category, then standing, then product name.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a top-N-per-group problem wearing a merchandising costume. The skill being probed: can you rank products inside each category and keep only the leaders, without flattening the categories together or trying to filter a ranking where it cannot be filtered? Anyone can sum revenue and sort it. What separates candidates is knowing that the per-category cutoff has to happen against a computed rank in an outer query, and choosing the ranking function whose tie behavior matches what 'top three' actually means.

---

### Break down the requirements

#### Step 1: Get revenue to the right grain

Revenue is not a stored column. It is SUM(quantity * unit_price) per product, computed by joining order_items to products and grouping by category and product_name. Multiply first, then sum: SUM(quantity) * SUM(unit_price) is a different, wrong number.

#### Step 2: Rank within each category, not globally

PARTITION BY category restarts the ranking inside every category, so each aisle gets its own 1, 2, 3. Drop the PARTITION BY and you get a single global leaderboard, which silently answers a different question.

#### Step 3: Filter the rank in an outer query

You cannot put revenue_rank <= 3 in the same SELECT that defines it; window functions are evaluated after WHERE. Wrap the ranking in a CTE, then filter revenue_rank in the outer query. Keep that rank column in the final SELECT so the output shows each product's standing, and order by category, then standing, then product name so ties resolve the same way every run.

---

### The solution

**Top three products per category**

```sql
WITH product_revenue AS (
  SELECT p.category,
         p.product_name,
         SUM(oi.quantity * oi.unit_price) AS revenue
  FROM order_items oi
  JOIN products p ON p.product_id = oi.product_id
  GROUP BY p.category, p.product_name
),
ranked AS (
  SELECT category,
         product_name,
         revenue,
         DENSE_RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS revenue_rank
  FROM product_revenue
)
SELECT category, product_name, revenue, revenue_rank
FROM ranked
WHERE revenue_rank <= 3
ORDER BY category, revenue_rank, product_name;
```

> **DENSE_RANK vs ROW_NUMBER is a real decision**
>
> ROW_NUMBER breaks ties arbitrarily, so two products tied for second silently drop one of them and you get exactly three rows that are not reproducible run to run. DENSE_RANK keeps both tied products at rank 2 and still admits a rank 3, which matches how a merchant reads 'the top three by revenue.' Say which one you picked and why; the interviewer is listening for that sentence.

> **Common Pitfall**
>
> Trying WHERE DENSE_RANK() OVER (...) <= 3 in a single query. It fails to parse because window functions run after WHERE. The fix is structural: compute the rank in one layer, filter it in the next. The second classic miss is SUM(quantity) * SUM(unit_price) instead of SUM(quantity * unit_price), which inflates revenue by mixing line items. A third quiet miss is dropping the standing column or sorting by raw revenue instead of by the rank: the output then no longer shows each product's position and ties land in a different order than expected.

> **Cost Analysis**
>
> On a 200M-row order_items table the join and GROUP BY collapse to one row per product (tens of thousands), so the window sort runs over a tiny intermediate. The expensive step is the hash aggregate over order_items; an index on product_id helps the join, but the scan is unavoidable since every line item contributes to a sum. Partition order_items by date and add a window filter if the question is ever scoped to a period.

> **Interviewers Watch For**
>
> Whether you partition the ranking by category before typing, whether you reach for a CTE rather than fighting the window-in-WHERE error, and whether you raise ties unprompted. A candidate who asks 'should tied products both count toward the three?' has already shown they understand the difference between the three ranking functions.

---

## Common follow-up questions

- What changes if the team wants exactly three rows per category even when there are ties? _(Pushes toward ROW_NUMBER plus a deterministic tiebreaker, and a discussion of why that answer is less defensible.)_
- How would you restrict this to revenue from the last 90 days? _(Tests adding a date predicate and where it lives; order_items has no date, so they must join through orders or a transaction date and reason about partition pruning.)_
- Now return each category's share of total revenue alongside the top three. _(Stretches into a second window (SUM OVER PARTITION BY category) or a separate aggregate joined back, and ratio computation.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the-ones-that-move)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.