# Longest Visit Streaks

> Day after day after day. Who kept coming back?

Canonical URL: <https://datadriven.io/problems/longest_visit_streaks>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

Retention wants the most loyal visitors of all time, defined as the users with the longest streaks of consecutive calendar days with at least one session, ending on or before 2026-08-10. Surface the top three streak lengths and every user who achieved them. If multiple users tie for one of those top streak lengths, include all of them.

## Worked solution and explanation

### Why this problem exists in real interviews

Streak detection is the textbook gaps-and-islands problem. The interviewer is checking that you can collapse multiple sessions per day to one day per user, recognize that consecutive days share a constant `(date - rownumber)`, and remember to use `DENSE_RANK` so ties at the third-longest streak all appear in the result.

> **Trick to Solving**
>
> The classic island trick: for each user, sort their distinct visit days and number them. The expression `date - rownumber` is **constant** within a streak and changes when there's a gap. Group by that constant to get the streak length.
> 
> 1. Distinct (user, day) tuples up to the cutoff
> 2. ROW_NUMBER per user ordered by day
> 3. `date(day, '-' || rn || ' days')` collapses each streak into one group
> 4. COUNT per (user, group) is the streak length
> 5. Take each user's MAX streak, then DENSE_RANK to keep the top three lengths (ties included)

---

### Break down the requirements

#### Step 1: Distinct visit days up to the cutoff

Build a `daily` CTE: `SELECT DISTINCT user_id, date(session_start) AS session_day FROM user_sessions WHERE date(session_start) <= '2026-08-10'`. Multiple sessions on the same day collapse to one row.

#### Step 2: Number each user's days in order

In `numbered`, attach `ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY session_day) AS rn`. Each user's days now carry their position in the user's chronological day list.

#### Step 3: Collapse streaks via the rownumber trick

In `streaks`, compute `date(session_day, '-' || rn || ' days') AS grp` and `COUNT(*) AS streak_len` grouped by `(user_id, grp)`. Consecutive days share the same `grp` value because subtracting an increasing rownumber from an increasing date returns a constant. A one-day gap shifts the constant and starts a new streak.

#### Step 4: Per-user max streak, ranked top three with ties

In `max_streaks`, take `MAX(streak_len)` per user. In `ranked`, attach `DENSE_RANK() OVER (ORDER BY streak_length DESC)`. Filter to `rnk <= 3` so ties at the third-longest length all appear, then sort the final output by `streak_length DESC`.

---

### The solution

**Gaps-and-islands streak detection with DENSE_RANK top three**

```sql
WITH daily AS (
    SELECT DISTINCT user_id, date(session_start) AS session_day
    FROM user_sessions
    WHERE date(session_start) <= '2026-08-10'
),
numbered AS (
    SELECT user_id, session_day,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY session_day) AS rn
    FROM daily
),
streaks AS (
    SELECT user_id,
        date(session_day, '-' || rn || ' days') AS grp,
        COUNT(*) AS streak_len
    FROM numbered
    GROUP BY user_id, grp
),
max_streaks AS (
    SELECT user_id, MAX(streak_len) AS streak_length
    FROM streaks
    GROUP BY user_id
),
ranked AS (
    SELECT user_id, streak_length,
        DENSE_RANK() OVER (ORDER BY streak_length DESC) AS rnk
    FROM max_streaks
)
SELECT user_id, streak_length
FROM ranked
WHERE rnk <= 3
ORDER BY streak_length DESC
```

> **Time and Space Complexity**
>
> **Time:** O(n log n) where n is the distinct (user, day) count after the dedup. Each window function is one ordered pass within its partition; the dominant cost is the initial 200M-row scan and dedup.
> 
> **Space:** O(distinct days) for the daily CTE, then progressively smaller as each CTE narrows.

> **Interviewers Watch For**
>
> Strong candidates name the rownumber trick out loud ("date minus rownumber is constant within a streak") and reach for `DENSE_RANK <= 3` instead of `LIMIT 3` so the prompt's tie-inclusion requirement is honored.

> **Common Pitfall**
>
> Using `LIMIT 3` after sorting. That returns exactly three rows even if a fourth user matches the third-place streak length. The prompt explicitly asks for ties to be included, which is what `DENSE_RANK` over the per-user max gives you.

---

## Common follow-up questions

- If users visit at midnight UTC, a session that starts late at night in their local timezone might fall on the next UTC date. How would you protect against that? _(Tests handling multi-session days and time zones.)_
- How would you change the query if a one-day gap is still considered an unbroken streak (forgive a single missed day)? _(Tests adapting the same shape to a different consecutive-day definition.)_
- How would you generalize `rnk <= 3` to take an arbitrary N from a parameter without rewriting the CTE chain? _(Tests extending the rank step to N parameterized.)_
- `user_sessions` is partitioned by `session_start`. Does the cutoff filter benefit from partition pruning, and how can you tell from the plan? _(Tests partition pruning for the cutoff filter.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/longest_visit_streaks)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.