# Session Rank

> Longest sessions rise to the top. Within each user, a pecking order.

Canonical URL: <https://datadriven.io/problems/session_rank>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Analytics wants to see each user's sessions ranked by engagement. Within each user, number their sessions from longest to shortest by session_duration_sec. Skip rows where session_duration_sec is NULL. Return the user_id, duration, and position number.

## Worked solution and explanation

### Why this problem exists in real interviews

The core skill being tested is ROW_NUMBER for sequential numbering, applied to the `user_sessions` table in a session analysis context. Getting the `session_duration_sec` column right is where most candidates slip.

---

### Break down the requirements

#### Step 1: Filter out null values

Exclude rows where `session_duration_sec` is NULL. This prevents nulls from polluting aggregations or creating phantom groups.

#### Step 2: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Row-number for session rank**

```sql
SELECT user_id, session_duration_sec, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY session_duration_sec DESC)
FROM user_sessions
WHERE session_duration_sec IS NOT NULL
```

> **Cost Analysis**
>
> With ~50M rows, the window function runs on the reduced set after filtering and grouping. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for how you handle NULL values and whether you account for them in filters and aggregations.

> **Common Pitfall**
>
> Forgetting to filter NULLs creates phantom groups or inflated counts. Always check `null_fraction` in the schema before assuming columns are clean.

---

## Common follow-up questions

- What would happen to your result if `user_sessions.session_duration_sec` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `session_duration_sec` and uses DISTINCT or deduplication where needed.)_
- If `user_sessions` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `device_id`? _(Tests ability to identify performance hotspots related to `user_sessions.device_id` at scale.)_
- What is the default window frame for your window function, and would explicitly setting ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW change anything? _(Tests knowledge of implicit vs explicit window frame specifications.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/session_rank)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.