# The Token Census

> How many tokens are out there?

Canonical URL: <https://datadriven.io/problems/the_token_census>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The platform team needs a headcount of how many token owners issued at least one API token during 2026. Return a single count of unique owners.

## Worked solution and explanation

### Why this problem exists in real interviews

This looks like a warm-up, but the platform team is checking two reflexes. One: do you ask 'distinct owners or distinct tokens?' before writing, since `api_tokens` has both `token_id` and `owner_id` and the prompt mixes the words. Two: do you notice that wrapping `issued` in `strftime` kills any index or partition pruning on that column. Silence on either is a downgrade signal.

---

### Break down the requirements

#### Step 1: Force the grain question

'Headcount of owners who issued at least one token' means `COUNT(DISTINCT owner_id)`, not `COUNT(*)` and not `COUNT(DISTINCT token_id)`. Say the cohort out loud: distinct `owner_id` from `api_tokens` filtered to the year.

#### Step 2: Filter on issuance, not expiry

The prompt says 'issued ... during 2026', so the predicate is on `issued`, not `expires` or `last_used`. Owners whose tokens were merely active in the year don't count.

#### Step 3: Pick the year extractor

On SQLite, `strftime('%Y', issued) = '2026'` is the literal form. Call out that you'd rewrite to `issued >= '2026-01-01' AND issued < '{{NEXT_YEAR}}-01-01'` in production for a sargable predicate.

#### Step 4: Return one scalar

One `SELECT`, one `COUNT(DISTINCT owner_id)`, aliased `distinct_owners`. No `GROUP BY`, no `HAVING`. The output is one row, one column.

---

### The solution

**DISTINCT OWNERS ISSUED IN YEAR**

```sql
SELECT COUNT(DISTINCT owner_id) AS distinct_owners
FROM api_tokens
WHERE strftime('%Y', issued) = '2026'
```

> **Cost Analysis**
>
> 150,000 rows is a full scan either way, but `strftime('%Y', issued)` is non-sargable: it disables any btree on `issued` and any time-based partition pruning. In production rewrite as `issued >= '2026-01-01' AND issued < date('2026-01-01', '+1 year')` so the range scan kicks in.

> **Interviewers Watch For**
>
> Before writing, ask: 'Distinct owners or distinct tokens?' The schema has both. Also ask whether revoked tokens (`status`) still count, because the prompt only mentions 'issued ... at least one'. The interviewer wants the grain pinned before code.

> **Common Pitfall**
>
> Writing `COUNT(*)` or `COUNT(token_id)`: that returns tokens, not owners, and inflates the number by every owner with multiple tokens. The other classic miss is filtering on `last_used` because it 'feels like activity'; the prompt asks about issuance.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- Rewrite the predicate to be sargable and explain why it matters at 100M rows. _(Tests whether you can convert function-on-column into a half-open range and articulate the index implication.)_
- How would you return monthly distinct owners for the same year in one query? _(Probes `GROUP BY strftime('%Y-%m', issued)` and whether you understand that `COUNT(DISTINCT)` per group does not equal sum of per-month distincts.)_
- An owner can have tokens under multiple `scope` values. How does that affect the count? _(Checks whether you recognize that distinct on `owner_id` already collapses scopes, but a per-scope breakdown would not.)_
- How would you exclude owners whose only token in the year was revoked? _(Forces you to add a `status` predicate and decide whether to filter pre-aggregation or via `HAVING`.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_token_census)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.