# The Slow Lane

> Peak API load. The slow endpoints.

Canonical URL: <https://datadriven.io/problems/the_slow_lane>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For endpoints that were first called in 2026, find the highest single-request latency recorded during March of any year. Show each endpoint and its peak latency, sorted from highest latency to lowest.

## Worked solution and explanation

### The mental model

Strip the prose away and this is a **peak-per-group** problem. For each `endpoint`, you want the maximum `latency` recorded in any March. That is the canonical use case for `GROUP BY` plus `MAX()`. Before writing a single token of SQL, name the four moving parts out loud: the **filter** (which rows count: March, any year), the **grouping** key (one row out per `endpoint`), the **aggregate** (`MAX(latency)`), and the **sort** (peak descending). Senior candidates verbalize these in that order, then write the query top-down. Juniors start typing and discover the shape mid-flight.

---

### The three traps

#### Step 1: Trap 1: the prompt is over-specified

The prompt says *"endpoints that were first called in 2026"* but the expected query never applies that filter. It returns every endpoint's March peak across all years. A careful reader notices the discrepancy and, in a real interview, says so: *"The first sentence implies a cohort filter on first-call year, but the metric is March of any year. Do you want the cohort restriction or not?"* The grader here accepts the simpler unfiltered answer. In production you would absolutely confirm before shipping. Flagging the ambiguity is itself the signal interviewers look for.

#### Step 2: Trap 2: function-on-column kills pruning

`strftime('%m', call_time) = '03'` wraps the partition key in a function. With `api_calls` partitioned by `call_time` across 250M rows, the planner cannot prune to specific partitions because the predicate is on a derived value, not on `call_time` itself. You get a full table scan. On Postgres or BigQuery you would prefer `EXTRACT(MONTH FROM call_time) = 3` against a partition-aware planner, or pre-materialize a `month_of_year` column at write time. SQLite leaves you with strftime, but knowing the cost is the point.

#### Step 3: Trap 3: MAX over a NULLable column

`MAX()` silently skips NULL `latency` values. Usually that is what you want (unfinished requests have no latency to measure). But if an endpoint has 100% NULL latency in March, `MAX()` returns NULL and the endpoint still appears in the result with a NULL `max_latency`. SQLite sorts NULLs **last** under `DESC`, so it lands at the bottom rather than the top, which may or may not match expectations. Worth a sentence to the interviewer.

---

### The solution

**Peak March latency per endpoint**

```sql
SELECT
    endpoint,
    MAX(latency) AS max_latency
FROM api_calls
WHERE strftime('%m', call_time) = '03'
GROUP BY endpoint
ORDER BY max_latency DESC
```

> **Cost shape on 250M rows**
>
> 250M rows is large. The aggregate output is tiny (one row per distinct `endpoint`, likely a few hundred), so the bottleneck is the scan, not the group. Partition pruning to "March across all years" is genuinely hard because partition keys are dates, not month-of-year, so you cannot point the planner at a contiguous range. The production fix is **not** a better query: it is a precomputed rollup like `endpoint_month_latency(endpoint, year, month, max_latency)` refreshed nightly. Then this query is a 12-row scan.

> **What seniors raise out loud**
>
> A senior candidate raises four things before writing SQL. (a) Is the *"first called in 2026"* clause a cohort filter or scene-setting? (b) March of *any* year is unusual, confirm it really means "across years" and not "in 2026". (c) What unit is `latency` (ms, microseconds, seconds)? It affects how you describe the answer. (d) Should failed requests (`status != 200`) be excluded, since their latencies are often timeout artifacts rather than real performance?

> **March-of-a-year vs March-across-years**
>
> Writing `WHERE call_time BETWEEN '2026-03-01' AND '2026-03-31'` is the most common wrong answer. That returns March of **one specific year**, not March across all years. On a multi-year table the two answers differ by roughly 11x in row count and produce different MAX values per endpoint. The visual similarity to the right answer makes it hard to catch in review.

> **When MAX is not enough**
>
> This query returns `endpoint` and `max_latency` but loses *which* request was slowest (the `call_id`, the `user_id`, the exact `call_time`). To keep those, you cannot use `GROUP BY MAX` alone. Switch to `ROW_NUMBER() OVER (PARTITION BY endpoint ORDER BY latency DESC)` and filter `WHERE rn = 1`, or self-join the table back to itself on the peak. The interviewer asking *"when did each peak happen?"* is the trigger to reach for window functions.

---

## Common follow-up questions

- The prompt says 'endpoints first called in 2026' but the query doesn't filter on that. How would you add the filter, and what would the impact on the row count be? _(Forces the candidate to confront the over-specified prompt head-on. Adding a subquery for `MIN(call_time) >= 'YEAR-01-01' AND MIN(call_time) < 'YEAR+1-01-01'` per endpoint trims the cohort sharply; quantify the row-count delta on a back-of-envelope basis.)_
- Production latency is in microseconds. The team wants milliseconds. Where do you do the conversion and why? _(Tests where data-type transformations belong: in the query, the view layer, or the storage schema. Divide-by-1000 in the SELECT is convenient but lies about precision; a view is honest; a schema change is safest.)_
- How would you also return the user_id and call_time of the slowest request per endpoint? _(Forces the jump from `GROUP BY MAX` to `ROW_NUMBER() OVER (PARTITION BY endpoint ORDER BY latency DESC)` with a `WHERE rn = 1` outer filter. Tests window-function fluency.)_
- An endpoint has 100% NULL latency for March. Does it appear in your output? Where in the sort order? _(Tests NULL semantics of MAX and ORDER BY ... DESC NULL ordering. SQLite places NULLs last under DESC; Postgres requires explicit NULLS LAST. Endpoint appears with max_latency = NULL.)_
- The query takes 8 minutes in production. The table is partitioned by call_time. What changes? _(The right answer is not 'add an index on latency'. It is a nightly rollup table keyed by (endpoint, year, month), reducing 250M rows to a few thousand. Tests instinct for materialized aggregates over query tuning.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_slow_lane)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.