# The Day-7 Retention Cohort

> Day one was promising. Day seven tells the truth.

Canonical URL: <https://datadriven.io/problems/the_day_7_retention_cohort>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For each weekly signup cohort, compute what percentage of users came back and performed an action on day 7 or later. Show the cohort week, total signups, retained users, and retention rate.

## Worked solution and explanation

### Why this problem exists in real interviews

Growth teams report cohort retention by signup week as their primary activation metric. Interviewers use this prompt to see whether you can compute a cohort key, scope to mature cohorts (at least 7 days old), express 'returned on day 7 or later' with julianday math, and divide two distinct counts safely.

---

### Break down the requirements

#### Step 1: Bucket signups by ISO-style week with strftime

strftime('%Y-W%W', signup_date) produces labels like '2026-W12' that sort chronologically. This is the cohort key. Use strftime, not DATE_TRUNC, because the harness is SQLite.

#### Step 2: Scope to mature cohorts with julianday('now')

julianday('now') - julianday(signup_date) >= 7 keeps only users whose 7-day window has fully elapsed. Without this filter, recent cohorts look artificially low because their day-7 has not happened yet.

#### Step 3: Count retained as users with a view at gap >= 7

LEFT JOIN page_views and use COUNT(DISTINCT CASE WHEN julianday(a.viewed_at) - julianday(u.signup_date) >= 7 THEN a.user_id END). The DISTINCT and the conditional inside the CASE both matter: a user with multiple late views still counts once, and users with only early views do not count at all.

---

### The solution

**Cohort key, mature filter, conditional distinct count, percent**

```sql
SELECT strftime('%Y-W%W', u.signup_date) AS cohort_week,
       COUNT(DISTINCT u.user_id) AS total_signups,
       COUNT(DISTINCT CASE WHEN julianday(a.viewed_at) - julianday(u.signup_date) >= 7 THEN a.user_id END) AS retained_users,
       ROUND(CAST(COUNT(DISTINCT CASE WHEN julianday(a.viewed_at) - julianday(u.signup_date) >= 7 THEN a.user_id END) AS DOUBLE) / COUNT(DISTINCT u.user_id) * 100, 1) AS retention_pct
FROM users u
LEFT JOIN page_views a ON u.user_id = a.user_id
WHERE julianday('now') - julianday(u.signup_date) >= 7
GROUP BY cohort_week
ORDER BY cohort_week
```

> **Cost Analysis**
>
> users is 10M rows, page_views is 500M. The LEFT JOIN on user_id will fan out heavily; an index on page_views(user_id, viewed_at) makes the join streaming. The COUNT(DISTINCT) is the expensive part because SQLite sorts to dedupe; if you can guarantee one row per (user, day), you can drop DISTINCT.

> **Interviewers Watch For**
>
> Did you use strftime('%Y-W%W', ...) for the cohort key, julianday differences (not DATEDIFF or EXTRACT(EPOCH)), CAST to DOUBLE before the percent division, and LEFT JOIN (so cohorts with zero retained users still appear)? Candidates who INNER JOIN drop empty cohorts silently.

> **Common Pitfall**
>
> Two pitfalls collide here: forgetting to CAST one side of the division to DOUBLE gives integer division and you get 0 for almost every cohort; and using >= 7 versus = 7 changes the question (the prompt says 'at least one view at day 7 or later', so >= 7 is correct).

---

## Common follow-up questions

- How would you compute Day-N for arbitrary N? _(Parameterize the 7 in both the WHERE and the CASE. The structure is identical; you just shift the maturity boundary.)_
- Why %Y-W%W instead of %Y-%W? _(%Y-W%W produces the human-friendly '2026-W12' label that sorts correctly. %Y-%W gives '2026-12' which collides with month-of-year keys downstream.)_
- How do you handle users with zero page_views? _(The LEFT JOIN keeps them; the CASE returns NULL for the conditional and they do not contribute to retained_users. Their cohort still gets credit in total_signups, which is the correct denominator.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_day_7_retention_cohort)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.