# First-Day Session Retention

> Day one retention. The first test.

Canonical URL: <https://datadriven.io/problems/first_day_session_retention>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

A user is 'retained' if they have a session exactly 1 calendar day after their very first session. Calculate the proportion of retained users out of all users. Return a single retention rate.

## Worked solution and explanation

### Why this problem exists in real interviews

Day-1 retention is a critical product metric. This tests self-join or window function skills to identify each user's first session, then check for a session exactly one day later.

> **Trick to Solving**
>
> First find each user's first session date, then check if a session exists on date + 1. Use a CTE for the first session, then LEFT JOIN for the day-1 check.
> 
> 1. Find each user's first session date with MIN()
> 2. LEFT JOIN back to sessions for the next-day session
> 3. Compute the retention rate as AVG of a binary flag

---

### Break down the requirements

#### Step 1: Find each user's first session date

`GROUP BY user_id` with `MIN(DATE(session_start))` gives the first session date per user.

#### Step 2: Check for day-1 session

LEFT JOIN `user_sessions` where the session date equals `first_date + 1 day`.

#### Step 3: Compute retention rate

`AVG(CASE WHEN day1_session IS NOT NULL THEN 1.0 ELSE 0.0 END) * 100` gives the percentage.

---

### The solution

**CTE for first session then day-1 retention check**

```sql
WITH first_session AS (
    SELECT user_id, MIN(DATE(session_start)) AS first_date
    FROM user_sessions
    GROUP BY user_id
)
SELECT ROUND(
    AVG(CASE WHEN s2.user_id IS NOT NULL THEN 1.0 ELSE 0.0 END) * 100, 2
) AS day1_retention_pct
FROM first_session fs
LEFT JOIN user_sessions s2
    ON fs.user_id = s2.user_id
    AND DATE(s2.session_start) = DATE(fs.first_date, '+1 day')
```

> **Cost Analysis**
>
> The CTE scans for per-user minimums. The LEFT JOIN checks for a day-1 session. Indexes on `(user_id, session_start)` are essential.

> **Interviewers Watch For**
>
> LEFT JOIN is mandatory. INNER JOIN would exclude non-retained users, making the rate 100% by construction.

> **Common Pitfall**
>
> Using INNER JOIN instead of LEFT JOIN excludes all non-retained users, making the retention calculation meaningless.

---

## Common follow-up questions

- How would you compute day-7 and day-30 retention alongside day-1? _(Tests extending with additional LEFT JOINs or conditional aggregation.)_
- How would you break retention by signup cohort? _(Tests adding GROUP BY on the first session date.)_
- What if a user has multiple sessions on day 1? _(Tests ensuring users are counted once.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/first_day_session_retention)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.