# Overlapping User Sessions

> Two sessions, one user, same clock. Something overlaps.

Canonical URL: <https://datadriven.io/problems/overlapping_user_sessions>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The user_sessions table records session windows with a start and end time. Some users have sessions that overlap in time. Find all pairs of overlapping sessions for the same user. Two sessions overlap if one starts before the other ends and vice versa. Return the user_id, the session_ids of both overlapping sessions, and their respective start times. Do not return a session paired with itself.

## Worked solution and explanation

### Why this problem exists in real interviews

Extracting insights from user_sessions.session_start grouped by session_duration_sec via self-join and date extraction is the central task. It is used in mid-level screens to test whether you pick the right aggregation function and partition boundary on the first attempt.

> **Trick to Solving**
>
> The overlap condition for two intervals [A_start, A_end) and [B_start, B_end) is `A_start < B_end AND B_start < A_end`. Spot it when the prompt mentions sessions, reservations, or bookings that "overlap in time."
> 
> 1. Derive each session's end time as `session_start + session_duration_sec`
> 2. Self-join the table on `user_id`
> 3. Apply the overlap predicate
> 4. Exclude self-pairs with `a.session_id < b.session_id`

---

### Break down the requirements

#### Step 1: Derive session end times

Compute end time as `datetime(session_start, '+' || session_duration_sec || ' seconds')` for each session.

#### Step 2: Self-join on user_id

`JOIN user_sessions b ON a.user_id = b.user_id AND a.session_id < b.session_id` pairs sessions for the same user without self-pairing.

#### Step 3: Apply overlap condition

`WHERE a.start < b.end AND b.start < a.end` keeps only overlapping pairs.

#### Step 4: Return the requested columns

Select `user_id`, both `session_id` values, and both start times.

---

### The solution

**Self-join with interval overlap condition**

```sql
SELECT
    a.user_id,
    a.session_id AS session_id_1,
    b.session_id AS session_id_2,
    a.session_start AS start_1,
    b.session_start AS start_2
FROM user_sessions a
JOIN user_sessions b
  ON a.user_id = b.user_id
  AND a.session_id < b.session_id
WHERE a.session_start < datetime(b.session_start, '+' || b.session_duration_sec || ' seconds')
  AND b.session_start < datetime(a.session_start, '+' || a.session_duration_sec || ' seconds')
```

> **Cost Analysis**
>
> With `user_sessions` (70,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Strong candidates immediately derive end times and state the overlap condition before writing code. Interviewers watch whether you handle the self-pair exclusion (`a.session_id < b.session_id`) without being prompted.

> **Common Pitfall**
>
> Forgetting to exclude self-pairs (`a.session_id < b.session_id`) returns every session paired with itself and every pair twice ((A,B) and (B,A)).

---

## Common follow-up questions

- If user_sessions.session_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in session_id.)_
- How do you prevent user_sessions from joining a row to itself in the self-join? _(Tests whether the candidate adds an inequality condition to exclude identity matches.)_
- With millions of distinct values in user_sessions.session_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like session_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/overlapping_user_sessions)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.