# Top Lessons Each Month

> Rank items within time periods and keep top 3

Canonical URL: <https://datadriven.io/problems/top_lessons_each_month>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

From the lesson progress data, surface the three most-completed lessons each month. The content team uses this to guide investment decisions for the next quarter.

## Worked solution and explanation

### Why this problem exists in real interviews

Notice the gap: the prompt asks about lesson completions, the table is `page_views`. A row here is a view, not a completion. In a real review you would push for a `lesson_completions` event source. For this exercise, `page_url` stands in as the lesson identifier and every row counts as a completion, which is the simplification the query bakes in.

---

### Break down the requirements

#### Step 1: Bucket views into months

`strftime('%Y-%m', viewed_at)` returns a sortable text label per month. Aggregating to `(month, page_url)` collapses 500M rows into a small set keyed by lesson and month.

#### Step 2: Rank within each month

`RANK() OVER (PARTITION BY month ORDER BY completion_count DESC)` resets per month. Two lessons tied for second both get rank 2; rank 3 still exists. The next position is 4, not 3.

#### Step 3: Filter to the top three and order for the consumer

`WHERE rnk <= 3` keeps the top tier. The outer `ORDER BY month, rnk` produces the calendar walk the content team expects to read.

---

### The solution

**TOP LESSONS EACH MONTH**

```sql
WITH monthly AS (
  SELECT strftime('%Y-%m', viewed_at) AS month,
         page_url,
         COUNT(*) AS completion_count
  FROM page_views
  GROUP BY 1, 2
),
ranked AS (
  SELECT month,
         page_url,
         completion_count,
         RANK() OVER (PARTITION BY month ORDER BY completion_count DESC) AS rnk
  FROM monthly
)
SELECT month, page_url, completion_count, rnk AS rank
FROM ranked
WHERE rnk <= 3
ORDER BY month, rnk
```

> **Cost Analysis**
>
> `page_views` is partitioned by `viewed_at`, so the planner reads every partition since the query has no date filter. The expensive step is the group-by, not the window; the window runs on the already-collapsed monthly grain, which is tiny by comparison.

> **Interviewers Watch For**
>
> Whether you call out that a pageview is not a completion. Also whether you justify `RANK` over `DENSE_RANK` or `ROW_NUMBER`: ties are kept and rank positions are skipped, so a month with a four-way tie at the top returns four rows, not three.

> **Common Pitfall**
>
> Using `ROW_NUMBER()` quietly drops tied lessons and gives a wrong picture of what was most-viewed. Using `strftime('%Y-%m', viewed_at)` is SQLite-specific; on Postgres swap in `to_char(viewed_at, 'YYYY-MM')` or `date_trunc('month', viewed_at)`.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you change the query to count only views where `dur_ms` exceeds a completion threshold? _(Tests moving from a proxy metric to a defensible engagement filter.)_
- If the content team only cares about the last six months, what changes in the query plan? _(Probes partition pruning on `viewed_at` and whether the predicate belongs in the CTE or outside.)_
- How would you return exactly three rows per month even on ties? _(Forces the `RANK` vs `ROW_NUMBER` tradeoff and a discussion of fair tiebreakers.)_

> **Real World**
>
> Content teams usually want share of completions, not raw counts, because monthly traffic swings. Layering `completion_count * 1.0 / SUM(completion_count) OVER (PARTITION BY month)` next to the rank gives them a stable signal across a growing user base.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_lessons_each_month)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.