# Extreme API Token Usage

> Outlier tokens. Suspiciously busy.

Canonical URL: <https://datadriven.io/problems/extreme_api_token_usage>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Surface the API tokens with the highest and lowest total request counts, excluding any tokens that have never been used. Show each token and its request count, from most to fewest requests.

## Worked solution and explanation

### Why this problem exists in real interviews

Platform teams run extreme value reports to find both abusive tokens and dormant tokens that still hold scopes. Interviewers use this prompt to see whether you can return both extremes in one result set, handle ties at either end, and exclude never-used tokens (last_used IS NULL) without accidentally dropping legitimate low-traffic tokens.

---

### Break down the requirements

#### Step 1: Define the candidate population

Only tokens with last_used IS NOT NULL count. Apply that filter inside the MIN and MAX subqueries as well as the outer query, otherwise the MIN comes back as the smallest requests value among never-used tokens.

#### Step 2: Match every row at the extremes, not just one

Comparing requests to (SELECT MAX(requests) ...) and (SELECT MIN(requests) ...) returns all tied rows automatically. ORDER BY requests DESC + LIMIT 1 would silently drop ties, which is the most common wrong answer.

#### Step 3: Project token_id and requests, sorted high to low

The output is a flat two-column list ordered by requests DESC, so highest-traffic tokens come first and the lowest extreme appears at the bottom.

---

### The solution

**Two scalar subqueries plus an OR for both extremes**

```sql
SELECT token_id, requests FROM api_tokens WHERE last_used IS NOT NULL AND (requests = (SELECT MAX(requests) FROM api_tokens WHERE last_used IS NOT NULL) OR requests = (SELECT MIN(requests) FROM api_tokens WHERE last_used IS NOT NULL)) ORDER BY requests DESC
```

> **Cost Analysis**
>
> api_tokens has about 250,000 rows. Each MIN and MAX scalar subquery is one full scan, but SQLite can satisfy them with an index on requests if one exists. Without an index it is three scans total (MIN, MAX, outer). At this size the cost is negligible. Keep the last_used IS NOT NULL predicate in all three places so the planner does not pick a wrong extreme.

> **Interviewers Watch For**
>
> They want to see that you preserve ties at both ends and that the NULL filter is applied to the subqueries, not just the outer query. A strong candidate also notes that this can be expressed with a single window function pass (MIN OVER (), MAX OVER ()) and discusses the trade-off.

> **Common Pitfall**
>
> Forgetting the last_used IS NOT NULL filter inside the (SELECT MIN(requests) FROM api_tokens) subquery is the classic bug. If never-used tokens have requests = 0 stored, the MIN extreme becomes 0 and the result is dominated by tokens that should have been excluded.

---

## Common follow-up questions

- How would you rewrite this with window functions instead of three subqueries? _(Tests fluency with MIN(requests) OVER () and MAX(requests) OVER () as a single-scan alternative, and the candidate should compare the cost on a 250k-row table.)_
- How would the query change if you also needed to return the owner_id for each extreme token? _(Trivial column addition, but the candidate should note the OR semantics still hold and there is no risk of duplicating rows.)_
- What if the spec changed to include the top 5 and bottom 5 instead of just the extremes? _(Forces a switch to ORDER BY + LIMIT or to two ranked CTEs unioned together, and the candidate should mention DENSE_RANK to handle ties at the boundary.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/extreme_api_token_usage)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.