# Active Token Owners in 2026

> Active token owners this year.

Canonical URL: <https://datadriven.io/problems/mDcX4kEhc4w_OVdm0Ho_9A>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The platform team is auditing the API access granted in 2026 and needs the size of that year's token-holder base. Find how many different owners were issued a token during the year.

## Worked solution and explanation

### What this really is

Strip off the audit framing and this is a count of unique keys inside a one-year slice. The whole problem hinges on one word: owners, not tokens. One person can hold several tokens (the sample seed happens to be a clean one-per-owner set, but production never is), so the moment you reach for COUNT(*) or COUNT(owner_id) you are counting rows. A single owner holding three tokens issued this year would inflate your headcount by two, and nobody reviewing the query would catch it until the number looked too big.

> **Trick to solving**
>
> COUNT(DISTINCT owner_id) collapses the duplicates and counts in one pass. You do not need a subquery that pulls the unique owners first and counts them second. The engine folds the dedup and the tally into a single aggregate, so the readable form is also the fast form.

#### Step 1: Pin the window to the issue date

The year filter rides on issued, not on expires or last_used. A token issued in a prior year that is still valid this year does not belong to this year's grant cohort, and a token used this year but issued earlier does not either. strftime('%Y', issued) extracts the four-digit year as text, so compare it against the year as a string literal, not a number.

#### Step 2: Decide the grain before you count

The ask is the size of the token-holder base, which is a count of people. That is the owner_id grain. Tokens are the rows; owners are what you report. Naming this out loud is the difference between a correct answer and a plausible-looking wrong one.

#### Step 3: Count the uniques

Apply COUNT(DISTINCT owner_id) over the filtered rows. The result is a single integer, so alias it clearly and return one row.

**Unique owners issued a token this year**

```sql
SELECT COUNT(DISTINCT owner_id) AS distinct_owners
FROM api_tokens
WHERE strftime('%Y', issued) = '2026'
```

*One scan: filter to the year on issued, then dedup and count owners in a single aggregate.*

> **Common pitfall**
>
> The status column is bait. The values are a deliberate case-soup of Active, active, REVOKED, revoked, and expired, and the title says 'Active Token Owners'. The instinct is to add WHERE status = 'active'. Do not. 'Active token owners' here means owners who were granted an active token during the year, not the literal status flag. Adding a status filter drops owners and changes the answer. The NULL values in expires and last_used are the same kind of red herring: you never touch those columns, so they cannot trip you.

**Counts rows (wrong)**

COUNT(owner_id) returns the number of tokens issued this year. With repeat holders it overshoots the headcount, and the bug is invisible because the number still looks reasonable.

**Counts owners (correct)**

COUNT(DISTINCT owner_id) returns the number of distinct people, which is the token-holder base the audit asked for.

> **Interviewers watch for**
>
> Two tells separate seniority on this easy-looking question. First, do you ask 'can one owner hold more than one token?' before you write anything. Second, do you filter on the issue date rather than expiry or last use. Reaching straight for COUNT(*) signals you read 'count' and stopped reading the noun after it.

> **Performance insight**
>
> This is a single full scan of api_tokens with a cheap distinct-count aggregate, no join and no subquery. At millions of rows it stays a streaming aggregate. The one real cost is the strftime call per row; on a hot path you would store issued as a real date and range-filter (issued >= '2026-01-01' AND issued < next year) so an index on issued can be used instead of computing the year for every row.

## Common follow-up questions

- Break the same year's grants down by scope, returning the unique owner count per scope. _(Tests moving from a single scalar to a grouped distinct count.)_
- How would you find owners who were issued more than one token during the year? _(Forces a grouped HAVING on the same grain and shows whether they understood the dedup.)_
- If issued were a real timestamp column with an index, how would you rewrite the year filter to keep it index-friendly? _(Probes the sargability point from the performance note.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/mDcX4kEhc4w_OVdm0Ho_9A)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.