# Peak Concurrent Tokens

> How many tokens were alive at the same time?

Canonical URL: <https://datadriven.io/problems/peak_concurrent_tokens>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

For every API token that has ever been issued, find the greatest number of concurrently active tokens during its lifetime and the first date that peak was reached. A token is active from its issued date until the day before its expiration; tokens with no expiration are considered still active. Return the token ID, peak concurrent count, and peak date.

## Worked solution and explanation

### Why this problem exists in real interviews

Identity and platform teams care about how many credentials are simultaneously valid because that drives revocation policy and audit log volume. Interviewers use this prompt to check that the candidate handles NULL expires (token never expires), uses DATE() to bucket timestamps so 'active on this day' has clear semantics, and chooses the correct strict vs non strict inequality.

---

### Break down the requirements

#### Step 1: Define 'active on date D' precisely

Token B is active on date D when DATE(B.issued) <= D AND (B.expires IS NULL OR DATE(B.expires) > D). The strict greater than on expires matches the prompt: a token expiring on D is no longer active that day. The OR IS NULL clause covers tokens with no expiry, which would otherwise be excluded by the comparison.

#### Step 2: Self join api_tokens to itself

For each row a in api_tokens, count rows b that are active on a.issued. Group by (a.token_id, a.issued) so the count attaches to the right anchor row. COUNT(*) is correct here because each row in b represents one distinct token (token_id is unique).

#### Step 3: Reduce to one row per token

Wrap in a final GROUP BY token_id with MAX(concurrent_active). The prompt asks for 'peak' but every token has only one issued date, so MAX collapses identical values. Returning issued AS peak_date relies on every group having one issued; if a token had multiple issued dates, this would need MAX(issued) or another tie break.

---

### The solution

**Self join on date overlap, aggregate twice**

```sql
WITH daily_counts AS (
  SELECT a.token_id, a.issued, COUNT(*) AS concurrent_active
  FROM api_tokens a
  JOIN api_tokens b
    ON DATE(b.issued) <= DATE(a.issued)
   AND (b.expires IS NULL OR DATE(b.expires) > DATE(a.issued))
  GROUP BY a.token_id, a.issued
)
SELECT token_id, MAX(concurrent_active) AS peak_concurrent, issued AS peak_date
FROM daily_counts
GROUP BY token_id
ORDER BY token_id
```

> **Cost Analysis**
>
> The self join is the cost driver. With 600K tokens, a naive nested loop is 360B comparisons and would not finish; the join predicate on DATE(b.issued) <= DATE(a.issued) is a range condition the planner can satisfy with a scan plus index lookup if api_tokens has a covering index on issued. A sweepline approach (events at issued become +1, expires becomes negative one, running SUM) is the production answer.

> **Interviewers Watch For**
>
> Whether you remembered B.expires IS NULL OR DATE(B.expires) > D (NULL handling is the most common bug), whether the inequality on expires is strict (the prompt says > D), and whether you wrapped both sides in DATE() so the timestamp comparison reduces to a date comparison.

> **Common Pitfall**
>
> Writing DATE(b.expires) >= DATE(a.issued) instead of strict greater than overcounts tokens by one on their expiry day. Forgetting the IS NULL OR clause silently drops every long lived token from the count, which can halve the peak in a real fleet.

---

## Common follow-up questions

- Rewrite this with the sweepline pattern. What does the SQL look like? _(Tests scalability awareness. UNION ALL of (issued, +1) and (expires, negative one) events with a running SUM window is the standard production technique.)_
- What if a token can be revoked before its expires timestamp? _(Tests data model awareness. The candidate should ask about a status column or revoked_at and adjust the active predicate to use the earliest of expires and revoked_at.)_
- Why DATE() on both sides of the comparison rather than direct timestamp comparison? _(Tests semantic precision. The prompt asks 'active on that token's issued date', not 'active at that exact second', so bucketing to date is required.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/peak_concurrent_tokens)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.