# Active API Tokens

> Tokens that have actually been used.

Canonical URL: <https://datadriven.io/problems/active_api_tokens>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The developer platform team needs to audit token usage. Pull all fields for every API token that has actually been used, meaning it has a recorded last-used timestamp.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this pattern to test whether candidates know when NOT to over-engineer. The correct answer is a two-clause query. Adding GROUP BY or aggregations here is a trap.

---

### Break down the requirements

#### Step 1: Select all columns

The prompt says "every column" - use `SELECT *` rather than listing individual columns.

#### Step 2: Filter to rows with a last_used date

`last_used IS NOT NULL` filters out tokens that have never been used. This is the only condition.

---

### The solution

**Filter active API tokens**

```sql
SELECT *
FROM api_tokens
WHERE last_used IS NOT NULL
```

> **Cost Analysis**
>
> With 100K rows and 25% NULL in `last_used`, this query returns ~75K rows. A full scan is unavoidable without an index. If queries like this are frequent, an index on `last_used` lets the engine skip NULL rows entirely.

> **Interviewers Watch For**
>
> Strong candidates write the simple two-clause answer without hesitation, then mention the NULL fraction or index opportunity unprompted. Overcomplicating with GROUP BY is a red flag.

> **Common Pitfall**
>
> Writing `last_used != NULL` or `last_used <> NULL` always returns zero rows - NULL comparisons with `=` or `!=` are undefined in SQL. Always use `IS NOT NULL`.

---

## Common follow-up questions

- If this query runs millions of times a day, how would you index `api_tokens` to make it fast, and what are the write overhead tradeoffs? _(Tests understanding of partial indexes and when they outperform full-column indexes.)_
- What happens if you write `WHERE last_used != NULL` instead of `WHERE last_used IS NOT NULL`? Why does it return zero rows? _(Tests whether the candidate understands that NULL comparisons with = or != always return NULL (treated as false), not true/false.)_
- About 25% of rows have NULL in `last_used`. How many rows do you expect this query to return from a 100K-row table, and how would you verify that? _(Tests awareness of the output row count relative to table size and how NULL fraction affects it.)_
- The requirements change: now return only tokens where `last_used` is within the last 30 days and `status` is 'active'. How does your query change? _(Tests whether the candidate can extend the query sensibly under a changed requirement.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/active_api_tokens)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.