# Session Duration by Account Status

> Average session duration broken down by user account status

Canonical URL: <https://datadriven.io/problems/session_duration_by_account_status>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The product team suspects that account tier correlates with session engagement. For each account status, show the average session duration in seconds.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this session analysis scenario to test self-join against the `users` and `user_sessions` tables. The focus is on how you handle the `account_status` column when building the result.

---

### Break down the requirements

#### Step 1: Join `users` to `user_sessions`

The join connects the two tables on their shared key. This brings the columns needed for filtering and aggregation into a single row set.

#### Step 2: Aggregate by `u.account_status`

`GROUP BY u.account_status` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

---

### The solution

**Join `users` to `user_sessions` to find session duration by account...**

```sql
SELECT u.account_status, AVG(us.session_duration_sec) AS avg_session_duration
FROM users u
INNER
JOIN user_sessions us ON u.user_id = us.user_id
GROUP BY u.account_status
```

> **Cost Analysis**
>
> With ~124M rows, the GROUP BY reduces the working set before any downstream operations; the join cost depends on the smaller table's cardinality. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- What result would you get if every value in `users.age_bucket` were NULL? Would your query return an empty set or something unexpected? _(Tests extreme NULL scenarios and whether the candidate guards against edge cases in `age_bucket`.)_
- With 4,000,000 distinct values in `users.user_id`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `user_id` affects grouping and sort operations.)_
- How would you modify this query if the business logic required grouping by both `user_id` and `username` instead of just one? _(Tests ability to adapt the query structure to changing requirements.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/session_duration_by_account_status)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.