# 2nd Most Common Content Type

> Everyone talks about number one. What about number three?

Canonical URL: <https://datadriven.io/problems/2nd_most_common_content_type>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

The content platform team is analyzing what formats registered creators prefer. Only content from creators with a registered user account should be considered. Determine the second most popular content type by volume, and if multiple types tie for that rank, include all of them. Return the full content item records for every item belonging to the qualifying type or types.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes **window functions** combined with **ranking with DENSE_RANK** at production scale. L5+ interviewers expect candidates to reason about correctness and performance simultaneously.

> **Trick to Solving**
>
> The prompt mentions ties or "include all". This signals `DENSE_RANK()` instead of `ROW_NUMBER()` or `LIMIT`. `DENSE_RANK` assigns the same rank to tied values without gaps.
> 
> 1. Spot the tie-inclusion language
> 2. Use `DENSE_RANK()` over `ROW_NUMBER()`
> 3. Filter on rank in a wrapping CTE

---

### Break down the requirements

#### Step 1: Join tables

Join `content_items` with `users` on `creator_id` to build the working set.

#### Step 2: Aggregate by a.content_type

`GROUP BY` with `COUNT(*)` computes the metric at the correct grain before ranking.

#### Step 3: Rank with DENSE_RANK

`DENSE_RANK() OVER (ORDER BY COUNT(*) DESC)` assigns ranks. DENSE_RANK preserves ties without gaps.

#### Step 4: Filter to the target rank

Wrap in a CTE and filter `WHERE rnk = 2` to extract the desired position.

---

### The solution

**Aggregate then rank with DENSE_RANK**

```sql
WITH ranked AS (
    SELECT
        a.content_type,
        COUNT(*) AS cnt,
        DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) AS rnk
    FROM content_items a
JOIN users b ON a.creator_id = b.user_id
    GROUP BY a.content_type
)
SELECT *
FROM ranked
WHERE rnk = 2
ORDER BY rnk
```

> **Cost Analysis**
>
> The main table has 8M rows (4 GB). The window function runs after grouping, so it operates on the reduced result set.

> **Interviewers Watch For**
>
> Interviewers check whether the candidate chooses the right window function variant and can articulate why.

> **Common Pitfall**
>
> Using `ROW_NUMBER()` or `LIMIT` instead of `DENSE_RANK()` silently drops tied rows, producing incorrect results when ties exist.

---

## Common follow-up questions

- How would you handle this if the data was sharded across multiple databases? _(Tests distributed query reasoning: where to push down filters, how to merge partial results.)_
- What if a partition went missing or arrived late? _(Tests data reliability awareness: idempotency, backfill strategies, and SLA implications.)_
- How would you monitor this query in production for correctness drift? _(Tests operational maturity: data quality checks, anomaly detection on output metrics.)_
- What is the difference between ROWS and RANGE in a window frame? _(ROWS is physical (row count), RANGE is logical (value-based). They differ when duplicates exist.)_
- What if a foreign key had orphaned references? _(Tests referential integrity awareness and how join behavior changes with missing parent rows.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/2nd_most_common_content_type)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.