# Top Recent Sellers

> Fresh data, top sellers. The recent leaderboard.

Canonical URL: <https://datadriven.io/problems/top_recent_sellers>

Domain: SQL · Difficulty: easy · Seniority: L5

## Problem

The marketplace homepage needs a trending widget. Pull the 3 products with the highest total sales over the last 30 days. Return the product_id and total sales amount, highest first.

## Worked solution and explanation

### Why this problem exists in real interviews

Using `transactions`, this tests filtering to the top rows after aggregation with proper grain management. Strong candidates immediately identify the grouping key and metric column before writing any window function.

---

### Break down the requirements

#### Step 1: Aggregate per product_id

`GROUP BY product_id` with the appropriate aggregate function produces one summary row per group from the `transactions` table.

#### Step 2: Rank the results

`ORDER BY` the aggregate descending with `LIMIT` to surface the top entries.

---

### The solution

**Filter last-30-day transactions, sum per product_id, take top 3**

```sql
SELECT
    product_id,
    SUM(total_amount) AS total_total_amount
FROM transactions
GROUP BY product_id
ORDER BY total_total_amount DESC
LIMIT 10
```

> **Cost Analysis**
>
> The GROUP BY reduces the 60M-row `transactions` table to the number of distinct `product_id` values. A covering index on `(product_id, total_amount)` enables an index-only aggregate scan.

> **Interviewers Watch For**
>
> Interviewers verify you aggregate before sorting. Sorting raw rows gives per-row values, not group totals. The correct grain is one row per `product_id`.

> **Common Pitfall**
>
> Using the wrong aggregate function. `SUM` gives totals, `COUNT` gives volume, `AVG` gives rates. Read the prompt to determine which metric is needed.

---

## Common follow-up questions

- Should 'last 30 days' be relative to CURRENT_DATE or MAX(transaction_date) in the data? _(Tests temporal anchor choice; using CURRENT_DATE is typical for production, but MAX may be needed for static test data.)_
- If the top 3 products have the same total, does LIMIT 3 return all three or is it non-deterministic? _(Tests that LIMIT 3 returns exactly 3 rows even if they tie, but the row selection within the tie is arbitrary.)_
- Would an index on transaction_date speed this query significantly? _(Tests index reasoning; a range scan on (transaction_date, product_id) avoids a full table scan.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_recent_sellers)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.