# Session-Fit Content

> Content that fits the session length.

Canonical URL: <https://datadriven.io/problems/session_fit_content>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

For user 197, find content items whose duration is less than or equal to that user's average session length. Show the user ID, content ID, and content duration.

## Worked solution and explanation

### Why this problem exists in real interviews

Drawn from a session analysis domain, this question centers on filtering and projection over the `content_items` and `user_sessions` tables. The tricky part is handling columns like `content_id`, `duration_seconds`, and `user_id` correctly under the given constraints.

---

### Break down the requirements

#### Step 1: Apply the range filter

The WHERE clause restricts rows to the target range. Applying this filter early reduces the volume flowing into downstream operations.

#### Step 2: Use a subquery to find the reference value

The scalar subquery computes a single value (like the maximum) that the outer query filters against. This avoids a self-join.

---

### The solution

**Apply the range filter to find session-fit content**

```sql
SELECT 197 AS user_id, ci.content_id, ci.duration_seconds
FROM content_items ci
WHERE ci.duration_seconds <= (
    SELECT AVG(session_duration_sec)
    FROM user_sessions
    WHERE user_id = 197
)
```

> **Cost Analysis**
>
> With ~53M rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you use a subquery or self-join, and can explain the tradeoffs.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- What result would you get if every value in `content_items.duration_seconds` were NULL? Would your query return an empty set or something unexpected? _(Tests extreme NULL scenarios and whether the candidate guards against edge cases in `duration_seconds`.)_
- `user_sessions.device_id` has roughly 6,000,000 distinct values. What index strategy would you use to avoid a full scan on `user_sessions`? _(Tests indexing knowledge specific to the high-cardinality `device_id` column in `user_sessions`.)_
- `content_items.content_type` only has 10 distinct values. If a new category were added, would your query automatically include it? _(Tests whether the query hard-codes values or dynamically adapts to `content_type` changes.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/session_fit_content)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.