# Pages Viewed by Session Duration

> Longer sessions, more pages? Check.

Canonical URL: <https://datadriven.io/problems/pages_viewed_by_session_duration>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Sessions are bucketed by their duration. For each bucket, show the number of sessions and the average pages viewed, ranked from highest average pages down.

## Worked solution and explanation

### Why this problem exists in real interviews

Extracting insights from user_sessions.session_start grouped by session_duration_sec via grouping and averaging is the central task. It is used in mid-level screens to test whether you pick the right aggregation function and partition boundary on the first attempt.

---

### Break down the requirements

#### Step 1: Define duration buckets

Use `CASE WHEN session_duration_sec < 60 THEN 'under_1min' WHEN session_duration_sec < 300 THEN '1_to_5min' ... END` to create meaningful buckets.

#### Step 2: Group by bucket and aggregate

`GROUP BY bucket` with `COUNT(*)` for session count and `AVG(pages_viewed)` for average pages.

#### Step 3: Order by average pages descending

`ORDER BY AVG(pages_viewed) DESC` surfaces the most engaged buckets first.

---

### The solution

**CASE bucketing then group and sort**

```sql
SELECT
    CASE
        WHEN session_duration_sec < 60 THEN 'under_1min'
        WHEN session_duration_sec < 300 THEN '1_to_5min'
        WHEN session_duration_sec < 900 THEN '5_to_15min'
        WHEN session_duration_sec < 1800 THEN '15_to_30min'
        ELSE 'over_30min'
    END AS duration_bucket,
    COUNT(*) AS session_count,
    AVG(pages_viewed) AS avg_pages
FROM user_sessions
GROUP BY duration_bucket
ORDER BY avg_pages DESC
```

> **Cost Analysis**
>
> With `user_sessions` (50,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If user_sessions.session_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in session_id.)_
- How would you verify that your aggregation on user_sessions.session_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in user_sessions.session_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like session_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/pages_viewed_by_session_duration)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.