# Session Overview

> Full engagement picture, even for the ones who never showed up.

Canonical URL: <https://datadriven.io/problems/session_overview>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Product wants a full engagement picture that still includes inactive accounts. For every user, count their sessions and find their longest session_duration_sec, keeping users with no sessions in the output. Return the username, session count, and max duration.

## Worked solution and explanation

### Why this problem exists in real interviews

This session analysis problem uses the `users` and `user_sessions` tables to evaluate self-join. Watch how the `username` column interact in the grouping and filtering logic.

---

### Break down the requirements

#### Step 1: Left join to preserve all base rows

A `LEFT JOIN` from `users` ensures every row appears in the output even if there is no match in `user_sessions`. Missing values become NULL.

#### Step 2: Aggregate by `u.username`

`GROUP BY u.username` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

---

### The solution

**Left join to preserve all base rows to find session overview**

```sql
SELECT u.username, COUNT(s.session_id), MAX(s.session_duration_sec)
FROM users u
LEFT
JOIN user_sessions s ON u.user_id = s.user_id
GROUP BY u.username
```

> **Cost Analysis**
>
> With ~72M rows, the GROUP BY reduces the working set before any downstream operations; the join cost depends on the smaller table's cardinality. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you choose the correct join type to avoid silently dropping rows.

> **Common Pitfall**
>
> Using INNER JOIN instead of LEFT JOIN drops rows with no match, producing an incomplete result. The prompt usually hints at this with 'all' or 'even if no'.

---

## Common follow-up questions

- What would happen to your result if `user_sessions.session_duration_sec` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `session_duration_sec` and uses DISTINCT or deduplication where needed.)_
- `user_sessions.user_id` has roughly 4,000,000 distinct values. What index strategy would you use to avoid a full scan on `user_sessions`? _(Tests indexing knowledge specific to the high-cardinality `user_id` column in `user_sessions`.)_
- If this query ran as a scheduled job, how would you add monitoring to detect when the result set is suspiciously empty? _(Tests operational awareness around scheduled query jobs.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/session_overview)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.