# Latest Session Per User

> Everyone has a most recent session.

Canonical URL: <https://datadriven.io/problems/latest_session_per_user>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The retention team is building a recency model and needs each user's most recent session start date alongside their user ID.

## Worked solution and explanation

### Why this problem exists in real interviews

The user_sessions table contains session_start and session_duration_sec values that must be processed with row numbering. This appears as a fundamentals check to probe whether you reason about the correct aggregation grain before writing any window or GROUP BY clause.

> **Trick to Solving**
>
> Mode requires finding the most frequent value. The trick is combining `COUNT` with `ORDER BY DESC LIMIT 1`.
> 
> 1. Group by the target column and count occurrences
> 2. Order by count descending
> 3. Handle ties if the prompt requires it

---

### Break down the requirements

#### Step 1: Filter to the target rows

Apply the `WHERE` filter to restrict the working set before aggregation. Filtering early reduces the number of rows that downstream operations process.

#### Step 2: Assign row numbers for deduplication

`ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ... DESC)` tags each row within its group. The outer query filters to `rn = 1` to keep only the target row.

#### Step 3: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**ROW_NUMBER deduplication by user**

```sql
SELECT session_id, user_id, device_id, session_start, session_duration_sec
FROM (
    SELECT session_id, user_id, device_id, session_start, session_duration_sec,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY session_start DESC) AS rn
    FROM user_sessions
) sub
WHERE rn = 1
```

> **Cost Analysis**
>
> The query scans 50M rows from `user_sessions`.

> **Interviewers Watch For**
>
> Interviewers expect you to articulate why you chose a specific join type and what happens to unmatched rows. Explaining why `ROW_NUMBER` is preferred over `DISTINCT` for deduplication shows you understand the difference between collapsing and selecting.

> **Common Pitfall**
>
> Forgetting that a JOIN can multiply rows when the relationship is one-to-many. Always check whether the join key is unique on at least one side.

---

## Common follow-up questions

- If user_sessions.session_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in session_id.)_
- If two rows in user_sessions have identical values in the ORDER BY columns, how does your ranking handle the tie? _(Tests understanding of RANK vs DENSE_RANK vs ROW_NUMBER tie-breaking behavior.)_
- With millions of distinct values in user_sessions.session_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like session_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/latest_session_per_user)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.