# Latency Quartiles Per Endpoint

> Quartile breakdowns. Endpoint by endpoint.

Canonical URL: <https://datadriven.io/problems/latency_quartiles_per_endpoint>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

For the API performance review, divide each endpoint's calls into four equal-sized buckets by latency, then for each endpoint-bucket pair show the minimum latency, maximum latency, and average latency. Results should appear ordered by endpoint name, then by bucket number.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a deliberate probe of whether you can combine aggregation with window functions without conflating grain. It appears in at L4/L5+ rounds because it separates candidates who memorize syntax from those who can reason about data transformations under pressure.

> **Trick to Solving**
>
> Pair-generation from a single table requires a self-join. The key constraint `a.id < b.id` avoids duplicates and self-pairs.
> 
> 1. Join the table to itself on the pairing condition
> 2. Use an inequality on the ID column to ensure each pair appears once
> 3. Apply any additional filters on the pair

---

### Break down the requirements

#### Step 1: Self-join the table

Join `api_calls` to itself to compare or pair rows within the same table. Use an inequality condition to avoid duplicate pairs.

#### Step 2: Compute the median with PERCENTILE_CONT

`PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col)` returns the interpolated median. This is the cleanest approach in engines that support ordered-set aggregates.

#### Step 3: Aggregate with the aggregate

Group by the output grain and apply `COUNT()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 4: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**PERCENTILE_CONT for quartile boundaries**

```sql
SELECT endpoint,
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY latency) AS p25,
    PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY latency) AS p50,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY latency) AS p75
FROM api_calls
GROUP BY endpoint
ORDER BY p50 DESC
```

> **Cost Analysis**
>
> The query scans 500M rows from `api_calls`. The window function requires a sort, which is O(n log n). Pre-aggregating reduces the sort input. The aggregation reduces the row count before any downstream processing, which is the key performance lever.

> **Interviewers Watch For**
>
> Strong candidates explain their choice of window function (`ROW_NUMBER` vs `RANK` vs `DENSE_RANK`) and why it matches the tie semantics.

> **Common Pitfall**
>
> Applying a window function before aggregating inverts the grain. Always aggregate first, then rank or compare across groups.

---

## Common follow-up questions

- How would you implement this if the data was sharded across multiple databases? _(Tests understanding of distributed query patterns: scatter-gather and merge.)_
- What if one partition has 99% of the rows (extreme skew)? _(Tests awareness of data skew impact on sort-based window operations.)_
- How would you make this query incremental instead of full-refresh? _(Tests knowledge of watermarks, CDC, or delta processing.)_
- If this powered a dashboard refreshing every 5 minutes, how would you optimize? _(Tests ability to design for latency: materialized views, pre-aggregation, caching.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/latency_quartiles_per_endpoint)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.