# Oldest and Newest User Sessions

> The extremes of the user base.

Canonical URL: <https://datadriven.io/problems/oldest_and_newest_user_sessions>

Domain: SQL · Difficulty: easy · Seniority: L4

## Problem

Pull all session details for the user with the earliest signup date and the user with the latest signup date. Only consider users who have at least one session.

## Worked solution and explanation

### Why this problem exists in real interviews

By linking user_sessions.session_start against users, this problem requires top-N selection and CTE composition on a multi-table result set. Interviewers deploy it as a fundamentals check to see if you handle the intermediate cardinality correctly.

---

### Break down the requirements

#### Step 1: Find the user with the earliest signup

`SELECT user_id FROM users ORDER BY signup_date ASC LIMIT 1` identifies the oldest user.

#### Step 2: Find the user with the latest signup

`SELECT user_id FROM users ORDER BY signup_date DESC LIMIT 1` identifies the newest user.

#### Step 3: Retrieve sessions for both users

Use `WHERE user_id IN (oldest_id, newest_id)` or `UNION ALL` two filtered queries to get all session records for both users. Only include users who have sessions.

---

### The solution

**UNION for extremum users, then join sessions**

```sql
WITH extremes AS (
    SELECT user_id FROM users
    WHERE user_id IN (SELECT user_id FROM user_sessions)
    ORDER BY signup_date ASC LIMIT 1
    UNION
    SELECT user_id FROM users
    WHERE user_id IN (SELECT user_id FROM user_sessions)
    ORDER BY signup_date DESC LIMIT 1
)
SELECT us.*
FROM user_sessions us
JOIN extremes e ON us.user_id = e.user_id
```

> **Cost Analysis**
>
> With `user_sessions` (60,000,000 rows), `users` (15,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If user_sessions.session_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in session_id.)_
- If the join between user_sessions and users produces a fan-out, how does that affect your aggregate? _(Tests awareness of join cardinality and its impact on SUM, COUNT, and AVG results.)_
- With millions of distinct values in user_sessions.session_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like session_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/oldest_and_newest_user_sessions)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.