# Most Efficient Region by Token Usage

> Some regions squeeze more out of every token.

Canonical URL: <https://datadriven.io/problems/most_efficient_region_by_token_usage>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

Our API tokens have issued and expiration dates plus a request count. For each region, compute the average token lifetime in days (expires minus issued) and the average number of requests. Then calculate the ratio of average requests to average lifetime, ranked highest first. Return the region, average lifetime in days, average requests, and the requests-per-day ratio.

## Worked solution and explanation

### Why this problem exists in real interviews

Multiple aggregates in one pass over `api_tokens`, plus date arithmetic on `issued` and `expires`. The interviewer is checking whether you reach for a single GROUP BY with `JULIANDAY` math instead of stitching two subqueries together, and whether you flag `expires IS NULL` before the query touches a keyboard.

---

### Break down the requirements

#### Step 1: Cohort: pick a region column

`scope` aliases to `region`. GROUP BY `scope`. Low cardinality so hash aggregation handles it cheaply.

#### Step 2: Lifetime in days

`AVG(JULIANDAY(expires) - JULIANDAY(issued))`. SQLite returns a float, but wrap in `CAST(... AS REAL)` so other engines do not silently floor it.

#### Step 3: Average requests

`AVG(requests)` in the same SELECT. Same pass over the table, no second scan, no join back to a derived aggregate.

#### Step 4: Ratio and order

`AVG(requests) / AVG(lifetime)` as `requests_per_day_ratio`. ORDER BY that ratio DESC. Never-expires rows go in WHERE: `expires IS NOT NULL`.

---

### The solution

**REGION TOKEN EFFICIENCY**

```sql
SELECT
  scope AS region,
  CAST(AVG(JULIANDAY(expires) - JULIANDAY(issued)) AS REAL) AS avg_lifetime_days,
  AVG(requests) AS avg_requests,
  CAST(AVG(requests) AS REAL)
    / CAST(AVG(JULIANDAY(expires) - JULIANDAY(issued)) AS REAL)
    AS requests_per_day_ratio
FROM api_tokens
WHERE expires IS NOT NULL
GROUP BY scope
ORDER BY requests_per_day_ratio DESC;
```

> **Cost Analysis**
>
> 500k rows, one sequential scan, hash aggregate keyed by `scope` (a handful of distinct values). No index needed; GROUP BY on low cardinality stays in memory. Adding a covering index on `(scope, issued, expires, requests)` only helps if this query runs hot, otherwise the scan wins.

> **Interviewers Watch For**
>
> Before you write anything, ask: how do we treat never-expires tokens? Three valid answers (exclude via `WHERE expires IS NOT NULL`, treat as infinite via `COALESCE`, report separately). Picking one and naming it out loud is the signal. Silently letting `AVG` drop them is the trap.

> **Common Pitfall**
>
> Writing `AVG(requests) / AVG(JULIANDAY(expires) - JULIANDAY(issued))` without `CAST(... AS REAL)`. On engines that integer-divide, the ratio collapses to 0 for most regions. Also: computing the ratio as `AVG(requests / lifetime)`, which is a different statistic.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you include never-expires tokens as infinite-lifetime? _(Probes `COALESCE(expires, DATE('now', '+10 years'))` and whether you flag that the answer becomes business-defined.)_
- Add a filter so a region needs at least 1000 tokens to qualify. _(Tests HAVING vs WHERE placement on an aggregate condition.)_
- Rank regions per status (active vs revoked) in one query. _(Tests conditional aggregation with `CASE WHEN` inside `AVG`, still one pass.)_
- What changes if `requests` can be NULL? _(Probes `AVG` ignoring NULL vs `COALESCE(requests, 0)` and the different denominators that result.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/most_efficient_region_by_token_usage)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.