# Top Products by Quantity Sold

> The bestsellers. By volume.

Canonical URL: <https://datadriven.io/problems/top_products_by_quantity_sold>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

For all transactions in 2026, show each product alongside its total quantity sold, from highest volume to lowest.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewer is testing whether you write a date filter that prunes partitions on an 80M row `transactions` table, and whether you aggregate `quantity` by `product_id` without losing rows to nulls or refunds. The ranking is trivial. The filter shape is the actual signal.

---

### Break down the requirements

#### Step 1: Pin the year window

Filter `transactions` to one calendar year of `transaction_date`. The macro `2026` fills in the prior year at runtime, so the predicate is one literal year, not a moving window.

#### Step 2: Aggregate per product

`GROUP BY product_id`, then `SUM(quantity)` as `total_quantity`. One row per product in the result, regardless of how many transactions hit that product.

#### Step 3: Sort by volume desc

`ORDER BY total_quantity DESC` puts highest-volume products on top. No `LIMIT` in the spec, so return the full ranked set.

---

### The solution

**TOP PRODUCTS BY QUANTITY**

```sql
SELECT
  product_id,
  SUM(quantity) AS total_quantity
FROM transactions
WHERE strftime('%Y', transaction_date) = '2026'
GROUP BY product_id
ORDER BY total_quantity DESC
```

> **Cost Analysis**
>
> `transactions` is 80M rows, partitioned by `transaction_date`. The `strftime` predicate wraps the partition key in a function call, so the planner cannot prune and you scan the full table. A half-open range (`transaction_date >= '2023-01-01' AND transaction_date < '2024-01-01'`) prunes to one year of partitions. Mention that tradeoff out loud.

> **Interviewers Watch For**
>
> Before writing, ask: can `quantity` go negative for refunds, and if so should they net out against sales? `SUM(quantity)` quietly nets refunds. If the interviewer wants gross volume, switch to `SUM(CASE WHEN quantity > 0 THEN quantity ELSE 0 END)`.

> **Common Pitfall**
>
> Writing `WHERE YEAR(transaction_date) = 2023` or the `strftime` form and shipping it. On a partitioned fact, that is a full scan. The bug is invisible on a 10k row dev sample and explodes at 80M. Always rewrite year filters as a half-open range on the partition column.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- Rewrite the year filter so it prunes partitions on `transaction_date`. _(Probes whether you understand that wrapping a partition key in a function defeats pruning.)_
- Now return only the top 10 products and break ties by `product_id`. _(Pushes you to add `LIMIT 10` plus a deterministic secondary sort key.)_
- How would the query change if `quantity` could be negative for refunds? _(Tests whether you reach for `CASE WHEN` or a filtered `SUM` instead of trusting raw totals.)_
- What does the plan look like if `product_id` cardinality is in the millions? _(Forces a discussion of hash aggregation, memory budget, and spill to disk on the GROUP BY.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_products_by_quantity_sold)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.