# Average Session Duration

> How long do users actually stay?

Canonical URL: <https://datadriven.io/problems/average_session_duration>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The engagement team wants each user's average session duration, defined as their total time in sessions divided by their total number of sessions. Exclude any sessions missing a recorded duration. Return the user ID and their average.

## Worked solution and explanation

### Why this problem exists in real interviews

Engagement metrics probe whether you know the difference between session-weighted and user-weighted averages. `AVG(duration) GROUP BY user_id` weights power users with 200 sessions vastly more than light users with 2. The interviewer wants you to ask 'per-user average across all sessions, or average of daily averages?' before writing, because they're different numbers.

---

### Break down the requirements

#### Step 1: Filter NULL durations first

`WHERE session_duration_sec IS NOT NULL` in the inner CTE. If you skip the filter, `COUNT(*)` counts those rows while `SUM` ignores them, and your denominator inflates. Counts and sums must stay aligned.

#### Step 2: Inner CTE: daily roll-up per user

`GROUP BY user_id, date(session_start)` with `SUM(session_duration_sec) AS total_duration` and `COUNT(*) AS session_count`. Daily grain trims cardinality before the outer aggregate and aligns with the `session_start` partition.

#### Step 3: Outer CTE: divide sums, not average averages

`SUM(total_duration) / SUM(session_count) GROUP BY user_id` is the correct per-user mean. `AVG(daily_avg)` would be a flat average of days, weighting a one-session day equal to a fifty-session day.

#### Step 4: Cast for real division

`CAST(SUM(total_duration) AS REAL) / SUM(session_count)`. Integer division silently floors the result in SQLite and Postgres. Cast the numerator.

---

### The solution

**USER-WEIGHTED AVERAGE SESSION DURATION**

```sql
WITH daily_sessions AS (
    SELECT user_id, date(session_start) AS day,
        SUM(session_duration_sec) AS total_duration,
        COUNT(*) AS session_count
    FROM user_sessions
    WHERE session_duration_sec IS NOT NULL
    GROUP BY user_id, date(session_start)
),
user_avg AS (
    SELECT user_id,
        CAST(SUM(total_duration) AS REAL) / SUM(session_count) AS avg_session_duration
    FROM daily_sessions
    GROUP BY user_id
)
SELECT user_id, avg_session_duration
FROM user_avg;
```

> **Cost Analysis**
>
> 70M rows partitioned by `session_start` means the inner scan reads every partition since there is no date filter. Hash-aggregate on `(user_id, day)` is the hot step (tens of millions of groups). If asked for last-30-day average, add `WHERE session_start >= current_date - 30` to prune partitions and collapse memory.

> **Interviewers Watch For**
>
> Say out loud: 'Per-user average across all sessions, or average of daily averages?' Then: 'NULL duration rows: drop them or treat as zero?' Those two questions are the entire signal. The SQL is mechanical once the answer is fixed.

> **Common Pitfall**
>
> Writing `AVG(AVG(session_duration_sec))` across the two levels. That averages daily averages, so a user with one fifty-minute session on Monday and ten five-minute sessions on Tuesday gets 27.5 minutes instead of the correct 9.1.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- Why divide `SUM(total_duration) / SUM(session_count)` instead of `AVG(daily_avg)`? _(Probes whether you can explain weighted vs unweighted means under aggregation.)_
- Return only users with at least 10 sessions in the window. _(Forces `HAVING SUM(session_count) >= 10` on the outer CTE, testing `HAVING` vs `WHERE` ordering.)_
- Now compute the company-wide average of these per-user averages. _(Layered aggregation; tests whether you wrap `user_avg` in a third stage rather than collapsing back to raw sessions.)_
- How would you make this incremental, refreshing only yesterday's slice? _(Opens partitioned incremental ETL: the daily CTE is already keyed for an `INSERT ... ON CONFLICT` upsert into a `user_daily_sessions` table.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/average_session_duration)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.