# Top 3 Revenue Months

> The three best months on record.

Canonical URL: <https://datadriven.io/problems/top_3_revenue_months>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Find the three highest-grossing months by total transaction amount, formatted as YYYY-MM.

## Worked solution and explanation

### Why this problem exists in real interviews

120M rows in `transactions`, partitioned by `transaction_date`. Two real decisions before SQL: pick a year-plus-month bucket so January 2023 doesn't merge with January 2024, and decide whether ties at rank 3 surface or get dropped. Both are 30-second conversations the interviewer is waiting for.

---

### Break down the requirements

#### Step 1: Bucket by year-month

Use `strftime('%Y-%m', transaction_date)` so February 2023 and February 2024 stay separate. The prompt explicitly asks for `YYYY-MM`, which doubles as the safe bucket key.

#### Step 2: Aggregate revenue

`SUM(total_amount)` per bucket in a single `GROUP BY year_month`. Do not pre-filter by `quantity`; `total_amount` is already the line revenue.

#### Step 3: Rank and cut

`ORDER BY SUM(total_amount) DESC LIMIT 3`. If the interviewer wants ties preserved, swap `LIMIT` for `DENSE_RANK()` and filter `rnk <= 3`.

#### Step 4: Format the output

Return `year_month` as the label and a numeric `total_sales`. Aliasing the bucket lets `GROUP BY year_month` work in SQLite and keeps the result self-documenting.

---

### The solution

**TOP 3 REVENUE MONTHS**

```sql
SELECT
  strftime('%Y-%m', transaction_date) AS year_month,
  SUM(total_amount) AS total_sales
FROM transactions
GROUP BY year_month
ORDER BY total_sales DESC
LIMIT 3;
```

> **Cost Analysis**
>
> 120M rows, partitioned by `transaction_date`. With no `WHERE`, the planner scans every partition and hash-aggregates by month bucket. Adding a trailing window filter (say, last 24 months) prunes most partitions and cuts the scan by an order of magnitude. Worth proposing out loud.

> **Interviewers Watch For**
>
> Ask: 'all-time or trailing window?' and 'if months 3 and 4 tie, both?' Both are real product calls, both change the query, and naming them is faster than the SQL.

> **Common Pitfall**
>
> Writing `strftime('%m', transaction_date)` collapses January 2023 with January 2024 into one bucket. The result looks plausible (12 rows, sorted) and slips past code review. Always include the year component when ranking calendar buckets.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you return ties so a 4-way tie at rank 3 surfaces all four months? _(Probes whether you reach for `DENSE_RANK()` over `LIMIT` and understand the difference.)_
- Now return the top 3 months per year. _(Tests partitioned ranking: `DENSE_RANK() OVER (PARTITION BY year ORDER BY total_sales DESC)`.)_
- What changes if `total_amount` can be NULL or negative (refunds)? _(Checks NULL handling in `SUM` and whether you treat refunds as month-of-refund or month-of-original-sale.)_
- How would you make this incremental so it updates daily without rescanning 120M rows? _(Opens the door to a `monthly_revenue` rollup table updated by a daily job keyed on `transaction_date`.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_3_revenue_months)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.