# Revoked Tokens by Scope

> Banned tokens, sorted by what they had access to.

Canonical URL: <https://datadriven.io/problems/revoked_tokens_by_scope>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The trust and safety pipeline tracks revoked API tokens by scope. For December 2026, count revoked tokens per scope. Include tokens revoked before December whose expiration extends into at least part of the month. Tokens with no expiration are considered permanently revoked. Match status case-insensitively. Return the scope and blocked count.

## Worked solution and explanation

### Why this problem exists in real interviews

The core skill being tested is grouped COUNT aggregation, applied to the `api_tokens` table in a data pipeline context. Getting the `scope` column right is where most candidates slip.

---

### Break down the requirements

#### Step 1: Apply the WHERE filter

Filter rows before any aggregation. This ensures only qualifying data enters the computation, keeping the result correct and the scan minimal.

#### Step 2: Aggregate by `scope`

`GROUP BY scope` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 3: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Apply the where filter to find revoked tokens by scope**

```sql
SELECT scope, COUNT(*) AS blocked_count
FROM api_tokens
WHERE LOWER(status) = 'revoked' AND issued <= '2026-12-31' AND (expires IS NULL OR expires >= '2026-12-01')
GROUP BY scope
ORDER BY scope
```

> **Cost Analysis**
>
> With ~400K rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for how you handle NULL values and whether you account for them in filters and aggregations.

> **Common Pitfall**
>
> Forgetting to filter NULLs creates phantom groups or inflated counts. Always check `null_fraction` in the schema before assuming columns are clean.

---

## Common follow-up questions

- What result would you get if every value in `api_tokens.expires` were NULL? Would your query return an empty set or something unexpected? _(Tests extreme NULL scenarios and whether the candidate guards against edge cases in `expires`.)_
- `api_tokens.last_used` has roughly 280,000 distinct values. What index strategy would you use to avoid a full scan on `api_tokens`? _(Tests indexing knowledge specific to the high-cardinality `last_used` column in `api_tokens`.)_
- `api_tokens.scope` only has 8 distinct values. If a new category were added, would your query automatically include it? _(Tests whether the query hard-codes values or dynamically adapts to `scope` changes.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/revoked_tokens_by_scope)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.