# 7-Day Token Retention

> Premium tokens, day by day.

Canonical URL: <https://datadriven.io/problems/7_day_token_retention>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

The developer platform team is worried about token churn and wants a quick retention curve. For each issuance date, they need two numbers side by side: how many tokens were active and generating traffic that day, and how many of those same owners still had an active, traffic-generating token 7 days later. Tokens with no expiration date should be treated as still valid. Limit the output to the first 7 dates chronologically.

## Worked solution and explanation

### Why this problem exists in real interviews

Developer platform teams hand you this to see if you treat `expires IS NULL` as a real branch instead of dropping those rows. They also watch whether you reach for LEFT JOIN to keep day-0 cohorts whose tokens churned, versus an INNER JOIN that silently collapses the denominator. The probe is whether you'll say the cohort definition out loud before writing.

---

### Break down the requirements

#### Step 1: Pin the day-0 cohort

Self-join `api_tokens` aliased `a` as the anchor. Filter `a.status = 'active'` and `a.requests > 0` in the outer `WHERE` so each row represents a token actively generating traffic on its issue date.

#### Step 2: Project the same owner forward 7 days

LEFT JOIN `api_tokens b` on `a.owner_id = b.owner_id`. The owner is the retention unit, not the token, so a fresh `token_id` for the same owner still counts as retained.

#### Step 3: Encode the day-7 window

Add `DATE(b.issued) <= DATE(a.issued, '+7 days')` and `(b.expires IS NULL OR DATE(b.expires) >= DATE(a.issued, '+7 days'))`. The null branch is the trap: treating `NULL` as expired drops permanent tokens.

#### Step 4: Require traffic on the right side too

Push `b.status = 'active' AND b.requests > 0` into the ON clause, not WHERE. In WHERE it converts the LEFT JOIN to INNER and zeros-out cohorts that fully churned.

#### Step 5: Aggregate, order, cap

`COUNT(DISTINCT a.token_id)` for day-0, `COUNT(DISTINCT b.token_id)` for day-7, `GROUP BY a.issued`, `ORDER BY a.issued`, `LIMIT 7`. DISTINCT is mandatory because owners with multiple tokens fan out.

---

### The solution

**TOKEN RETENTION SELF JOIN**

```sql
SELECT
  a.issued AS the_date,
  COUNT(DISTINCT a.token_id) AS active_day0,
  COUNT(DISTINCT b.token_id) AS active_day7
FROM api_tokens a
LEFT JOIN api_tokens b
  ON a.owner_id = b.owner_id
  AND b.status = 'active'
  AND b.requests > 0
  AND DATE(b.issued) <= DATE(a.issued, '+7 days')
  AND (b.expires IS NULL OR DATE(b.expires) >= DATE(a.issued, '+7 days'))
WHERE a.status = 'active' AND a.requests > 0
GROUP BY a.issued
ORDER BY a.issued
LIMIT 7
```

> **Cost Analysis**
>
> Self-join on 200k rows fans out by owner. Without an index on `(owner_id, issued)` the planner hashes the full table twice. With one, the day-7 side becomes a bounded range probe. The `LIMIT 7` does not push down past the GROUP BY, so the full aggregation runs first.

> **Interviewers Watch For**
>
> Ask out loud: is retention measured per token or per owner? The brief says same owners still had an active token, so `owner_id` is the join key. Also confirm whether `expires IS NULL` means perpetual or unknown. The prompt says perpetual, code it explicitly.

> **Common Pitfall**
>
> Putting `b.status = 'active'` in WHERE instead of ON. The LEFT JOIN preserves cohorts with zero day-7 survivors; a WHERE predicate on the right table filters those nulls back out and you report inflated retention because empty cohorts vanish from the denominator.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you extend this to a full 1, 7, 30 day retention curve in one query? _(Probes whether you reach for conditional aggregation with multiple windowed predicates or three separate self-joins.)_
- What changes if retention is defined per scope rather than per owner? _(Tests whether you understand the join key drives the cohort grain and that `scope` would join alongside `owner_id`.)_
- How would you handle tokens reissued the same day to the same owner? _(Probes deduplication strategy and whether `COUNT(DISTINCT owner_id)` is the right denominator instead of `token_id`.)_
- Rewrite this without a self-join using window functions. _(Tests whether you can express the 7-day look-ahead as a range frame over `issued` partitioned by `owner_id`.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/7_day_token_retention)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.