# Content Mix

> One content format to bet the quarter on.

Canonical URL: <https://datadriven.io/problems/content_mix>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The editorial team is deciding next quarter's production budget and wants to compare content formats head-to-head. For each content type, show the number of published items, the average duration in seconds, and the share of items that have no duration on file. Round the share to one decimal place and present the formats from most published to least.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests multi-metric aggregation with NULL handling. Computing the share of items with no duration probes whether you can express NULL checks inside conditional aggregation.

---

### Break down the requirements

#### Step 1: Group by content type

`GROUP BY content_type` produces one row per format.

#### Step 2: Compute three metrics

`COUNT(*)` for items, `AVG(duration_seconds)` for average duration, and `ROUND(100.0 * SUM(CASE WHEN duration_seconds IS NULL THEN 1 ELSE 0 END) / COUNT(*), 1)` for the no-duration share.

#### Step 3: Sort by count descending

`ORDER BY COUNT(*) DESC` lists most published first.

---

### The solution

**Multi-metric aggregation with NULL share**

```sql
SELECT
    content_type,
    COUNT(*) AS item_count,
    AVG(duration_seconds) AS avg_duration,
    ROUND(100.0 * SUM(CASE WHEN duration_seconds IS NULL THEN 1 ELSE 0 END) / COUNT(*), 1) AS no_duration_pct
FROM content_items
GROUP BY content_type
ORDER BY item_count DESC
```

> **Cost Analysis**
>
> Single scan of 500K rows. Low cardinality GROUP BY (handful of content types). Trivially fast.

> **Interviewers Watch For**
>
> Whether the candidate handles the NULL check correctly. `AVG(duration_seconds)` already ignores NULLs, so the average is only over items with a duration. The NULL share metric requires explicit CASE WHEN logic.

> **Common Pitfall**
>
> Using `COUNT(duration_seconds)` instead of `COUNT(*)` for the denominator of the NULL share would undercount, since COUNT(column) skips NULLs.

---

## Common follow-up questions

- What is the difference between COUNT(*) and COUNT(duration_seconds)? _(COUNT(*) counts all rows; COUNT(column) counts non-NULL values. Fundamental SQL concept.)_
- How would you also show the median duration per type? _(Tests PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY duration_seconds).)_
- What if you needed to impute missing durations with the category average? _(Tests COALESCE with a window function: COALESCE(duration, AVG(duration) OVER (PARTITION BY content_type)).)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/content_mix)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.