# Latency Gap to 10th Fastest

> One server. Compared to the 10th fastest.

Canonical URL: <https://datadriven.io/problems/latency_gap_to_10th_fastest>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

During an SRE review, the VP of Infrastructure asked how server 'web-prod-01' compares to the 10th-fastest server by average response time. Calculate the absolute difference in their average response times. Return a single latency gap value.

## Worked solution and explanation

### Why this problem exists in real interviews

On-call SREs constantly compare a single host to a peer benchmark: 'how does web-prod-07 stack up against the tenth-fastest box in the fleet?' Interviewers use this prompt to see whether you can compute per-group averages, rank them, and then subtract two specific scalar values out of that ranked set. The trick is that the answer is one number, not a table, so the candidate has to lift two scalars from the same CTE and combine them with `ABS`.

---

### Break down the requirements

#### Step 1: Average response time per server

`SELECT server_name, AVG(response_time_ms) FROM server_logs GROUP BY server_name`. With 100M log rows and only 30 distinct `server_name` values, this is a simple hash aggregate that collapses to 30 rows. `AVG` ignores NULL `response_time_ms` automatically, which matches the prompt's intent.

#### Step 2: Rank servers ascending with no gaps on ties

`DENSE_RANK() OVER (ORDER BY AVG(response_time_ms) ASC)` is the right ranking function: tied averages share a rank and the next value takes the immediately following rank, with no gap. Using `RANK()` would skip rank 10 if two servers tied at rank 9, and `ROW_NUMBER()` would silently break ties in an order the prompt does not endorse.

#### Step 3: Pull two scalars and subtract them

Wrap the ranked output in a CTE, then use two scalar subqueries inside `ABS()`: one filtered to `server_name = 'web-prod-07'`, the other to `rnk = 10` with `LIMIT 1` (in case multiple servers share rank 10, any one of their averages works because they are all equal). The outer `SELECT` returns one column called `latency_gap`.

---

### The solution

**Average per server, dense-rank, then absolute scalar gap**

```sql
WITH server_ranks AS (
  SELECT server_name, AVG(response_time_ms) AS avg_rt,
         DENSE_RANK() OVER (ORDER BY AVG(response_time_ms) ASC) AS rnk
  FROM server_logs GROUP BY server_name
)
SELECT ABS(
  (SELECT avg_rt FROM server_ranks WHERE server_name = 'web-prod-07')
  - (SELECT avg_rt FROM server_ranks WHERE rnk = 10 LIMIT 1)
) AS latency_gap
```

> **Cost Analysis**
>
> `server_logs` holds 100M rows partitioned daily across 365 days. The aggregate must touch every partition since the prompt does not narrow time. After the GROUP BY there are only 30 rows, so the window function and the two scalar subqueries are essentially free. The dominant cost is the single full scan plus the hash aggregate.

> **Interviewers Watch For**
>
> Interviewers watch whether you pick `DENSE_RANK` (matches the 'no gaps on ties' phrasing) over `RANK` or `ROW_NUMBER`, whether you remember `ABS()` because the prompt asks for an absolute difference, and whether you handle the multi-row case at rank 10 with `LIMIT 1` so the scalar subquery cannot raise an error if two servers tie.

> **Common Pitfall**
>
> Writing `(SELECT avg_rt FROM server_ranks WHERE rnk = 10)` without `LIMIT 1` raises 'subquery returned more than one row' the moment two servers share rank 10. The other classic miss is omitting `ABS`: if web-prod-07 is faster than the rank-10 server, the gap comes back negative and the autograder rejects the row.

---

## Common follow-up questions

- What would change in your CTE if `response_time_ms` were 100% NULL for one of the 30 servers? _(Tests NULL behavior in `AVG` and rank assignment. `AVG` returns NULL for that group, and DENSE_RANK places NULL at the start (SQLite, Postgres) or end (some other engines), shifting which server lands at rank 10.)_
- How would you generalize this to compute the gap for every server against the rank-10 baseline in one query? _(Tests whether the candidate replaces the second scalar subquery with a CROSS JOIN to a one-row CTE holding the rank-10 average, so every server row in `server_ranks` can subtract from it without rescanning.)_
- How would you scope this to only the last 7 days of `log_timestamp` without breaking partition pruning? _(Tests partition-key awareness. `server_logs` is partitioned by `log_timestamp`. A range predicate `log_timestamp >= date('now', '-7 days')` lets the planner prune; wrapping `log_timestamp` in `strftime` disables it.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/latency_gap_to_10th_fastest)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.