# Best-Selling Reps Each Month

> In every category, a few sellers rise to the top.

Canonical URL: <https://datadriven.io/problems/best_selling_reps_each_month>

Domain: SQL · Difficulty: easy · Seniority: L4

## Problem

The sales team is identifying top buyers within each product category for January. Within each category, find the 3 highest-spending users by total transaction amount. If users tie, they share the same position and all should be included. Show the user ID, total amount, category, and transaction date.

## Worked solution and explanation

### Why this problem exists in real interviews

Working with `transactions`, `products`, this problem isolates dense ranking to handle ties combined with nested subqueries. The interviewer expects candidates to articulate why `quantity`, `total_amount`, `transaction_date` matter for correctness before touching the keyboard.

> **Trick to Solving**
>
> The prompt mentions ties or "include all". This signals `DENSE_RANK()` instead of `ROW_NUMBER()` or `LIMIT`. `DENSE_RANK` assigns the same rank to tied values without gaps.
> 
> 1. Spot the tie-inclusion language
> 2. Use `DENSE_RANK()` over `ROW_NUMBER()`
> 3. Filter on rank in a wrapping CTE

---

### Break down the requirements

#### Step 1: Join tables

Join `transactions` with `products` on `product_id` to build the working set.

#### Step 2: Aggregate by b.product_name

`GROUP BY` with `SUM(total_amount)` computes the metric at the correct grain before ranking.

#### Step 3: Rank with DENSE_RANK

`DENSE_RANK() OVER (ORDER BY SUM(total_amount) DESC)` assigns ranks. DENSE_RANK preserves ties without gaps.

#### Step 4: Filter to the target rank

Wrap in a CTE and filter `WHERE rnk <= 10` to extract the desired position.

---

### The solution

**Dense-rank for best-selling reps month**

```sql
WITH ranked AS (
    SELECT
        b.product_name,
        SUM(total_amount) AS total_total_amount,
        DENSE_RANK() OVER (ORDER BY SUM(total_amount) DESC) AS rnk
    FROM transactions a
JOIN products b ON a.product_id = b.product_id
    GROUP BY b.product_name
)
SELECT *
FROM ranked
WHERE rnk <= 10
ORDER BY rnk
```

> **Cost Analysis**
>
> The main table has 150M rows (38 GB). Partitioned on `transaction_date`, so queries filtering on that column skip most partitions. The window function runs after grouping, so it operates on the reduced result set. The smaller dimension table keeps the join selective.

> **Interviewers Watch For**
>
> Interviewers check whether the candidate chooses the right window function variant and can articulate why.

> **Common Pitfall**
>
> Using `ROW_NUMBER()` or `LIMIT` instead of `DENSE_RANK()` silently drops tied rows, producing incorrect results when ties exist.

---

## Common follow-up questions

- The `rating` column in `products` has roughly 5% NULLs. How does your query handle those rows, and would the result change if NULLs were replaced with zeros? _(Tests whether the candidate understands how NULLs propagate through aggregation functions and whether their WHERE/JOIN conditions implicitly filter them out.)_
- You used DENSE_RANK here. What changes in the output if you switch to ROW_NUMBER, and which is correct for this problem? _(Tests understanding of tie-handling semantics: DENSE_RANK preserves ties while ROW_NUMBER breaks them arbitrarily.)_
- `transaction_id` in `transactions` has ~150M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- Could you express this same logic as a single query without CTEs or subqueries? What readability trade-off does that introduce? _(Tests whether the candidate can flatten nested logic and understands when decomposition aids maintainability.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/best_selling_reps_each_month)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.