# Best Selling Product by Month

> Every month has a winner.

Canonical URL: <https://datadriven.io/problems/best_selling_product_by_month>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

The merchandising team needs a monthly leaderboard of top-grossing products. Revenue is calculated as quantity times the product's listed price. Exclude any transaction where total_amount is negative, as those represent refunds. If two products tie in a given month, include both. Return the month, product name, and total revenue.

## Worked solution and explanation

### Why this problem exists in real interviews

Built around `transactions`, `products`, this problem requires dense ranking to handle ties combined with date extraction and truncation. Interviewers watch whether candidates handle the relationship between `quantity`, `total_amount`, `transaction_date` correctly under grouping pressure.

> **Trick to Solving**
>
> The prompt mentions ties or "include all". This signals `DENSE_RANK()` instead of `ROW_NUMBER()` or `LIMIT`. `DENSE_RANK` assigns the same rank to tied values without gaps.
> 
> 1. Spot the tie-inclusion language
> 2. Use `DENSE_RANK()` over `ROW_NUMBER()`
> 3. Filter on rank in a wrapping CTE

---

### Break down the requirements

#### Step 1: Join tables

Join `transactions` with `products` on `product_id` to build the working set.

#### Step 2: Aggregate by b.product_name

`GROUP BY` with `SUM(quantity)` computes the metric at the correct grain before ranking.

#### Step 3: Rank with DENSE_RANK

`DENSE_RANK() OVER (ORDER BY SUM(quantity) DESC)` assigns ranks. DENSE_RANK preserves ties without gaps.

#### Step 4: Filter to the target rank

Wrap in a CTE and filter `WHERE rnk <= 10` to extract the desired position.

---

### The solution

**Dense-rank for best selling product month**

```sql
WITH ranked AS (
    SELECT
        STRFTIME('%Y-%m', transaction_date) AS month, b.product_name,
        SUM(quantity) AS total_quantity,
        DENSE_RANK() OVER (PARTITION BY STRFTIME('%Y-%m', transaction_date) ORDER BY SUM(quantity) DESC) AS rnk
    FROM transactions a
JOIN products b ON a.product_id = b.product_id
WHERE total_amount >= 0
    GROUP BY STRFTIME('%Y-%m', transaction_date), b.product_name
)
SELECT *
FROM ranked
WHERE rnk <= 10
ORDER BY month, rnk
```

> **Cost Analysis**
>
> The main table has 300M rows (77 GB). Partitioned on `transaction_date`, so queries filtering on that column skip most partitions. The window function runs after grouping, so it operates on the reduced result set. The smaller dimension table keeps the join selective.

> **Interviewers Watch For**
>
> Interviewers check whether the candidate chooses the right window function variant and can articulate why.

> **Common Pitfall**
>
> Using `ROW_NUMBER()` or `LIMIT` instead of `DENSE_RANK()` silently drops tied rows, producing incorrect results when ties exist.

---

## Common follow-up questions

- The `rating` column in `products` has roughly 4% NULLs. How does your query handle those rows, and would the result change if NULLs were replaced with zeros? _(Tests whether the candidate understands how NULLs propagate through aggregation functions and whether their WHERE/JOIN conditions implicitly filter them out.)_
- You used DENSE_RANK here. What changes in the output if you switch to ROW_NUMBER, and which is correct for this problem? _(Tests understanding of tie-handling semantics: DENSE_RANK preserves ties while ROW_NUMBER breaks them arbitrarily.)_
- `transaction_id` in `transactions` has ~300M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- If the join key between `transactions` and `products` had orphaned references (child rows with no parent), how would that change your results? _(Tests understanding of referential integrity assumptions baked into the query logic.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/best_selling_product_by_month)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.