# Top Duration Content Items

> The content that held the number-one spot.

Canonical URL: <https://datadriven.io/problems/top_duration_content_items>

Domain: SQL · Difficulty: easy · Seniority: L4

## Problem

Find all content items whose duration is the highest among all items sharing the same content type, published since 2026. Show the unique content IDs and titles.

## Worked solution and explanation

### What this is really asking

`MAX(duration_seconds)` is per `content_type`, but the `publish_date >= 'YEAR-01-01'` filter applies to rows you return, not rows you compare against. That gap is the whole problem.

---

### Break down the requirements

#### Step 1: Per-type maximum

For each `content_type`, find the largest `duration_seconds` across the full table. Correlated subquery or windowed MAX both work.

#### Step 2: Filter the output

Keep rows where `publish_date >= 'YEAR-01-01'` AND `duration_seconds` equals that per-type max. Ties allowed.

#### Step 3: Distinct id + title

`DISTINCT content_id, title` guards against duplicates. Order by `content_id` for stability.

---

### The solution

**PER-TYPE MAX VIA CORRELATED SUBQUERY**

```sql
SELECT DISTINCT ci.content_id, ci.title
FROM content_items ci
WHERE ci.duration_seconds = (
    SELECT MAX(ci2.duration_seconds)
    FROM content_items ci2
    WHERE ci2.content_type = ci.content_type
)
AND ci.publish_date >= '2026-01-01'
ORDER BY ci.content_id
```

> **Cost Analysis**
>
> An index on `(content_type, duration_seconds)` turns each lookup into a seek. `MAX() OVER (PARTITION BY content_type)` is one pass and usually faster at 4M rows.

> **Interviewers Watch For**
>
> Whether you put the date filter inside the subquery (wrong: max becomes per-type-since-YEAR) or outside (right). Articulating that split is the move.

> **Common Pitfall**
>
> `HAVING duration_seconds = MAX(duration_seconds)` after `GROUP BY content_type` collapses to one row per type and loses ties. You need the max projected back, not aggregated.

> **The False Start**
>
> First instinct is `GROUP BY content_type` with `MAX(duration_seconds)`, then join back for `content_id, title`. If you also push the date filter into that grouped CTE, you silently redefine max as max-since-YEAR. Pivot: keep the per-type max global.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- Rewrite this with a window function. _(`MAX(duration_seconds) OVER (PARTITION BY content_type)` in a CTE, then filter on equality plus date outside.)_
- What if the prompt wanted the top item per type since YEAR, not the all-time max filtered to YEAR? _(Move the date filter into the partitioned aggregate. Numbers differ when a pre-YEAR row is the type's longest.)_
- How would you index this table? _(Composite index on `(content_type, duration_seconds DESC)` plus `publish_date`, or a covering index on all three.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_duration_content_items)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.