# 80th Percentile API Latency

> The 80th percentile tells the real story.

Canonical URL: <https://datadriven.io/problems/80th_percentile_api_latency>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The performance engineering team is setting SLO thresholds for the API gateway. They need the latency value at the 80th percentile across all recorded API calls. Return a single value representing that threshold.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge targets percentile computation over ordered sets combined with NTILE bucketing against `api_calls`. Getting the grouping wrong on `endpoint`, `method`, `status` produces silently incorrect counts, which is exactly the trap interviewers set.

---

### Break down the requirements

#### Step 1: Choose the right percentile function

`PERCENTILE_CONT(0.8)` interpolates between values for continuous distributions. `PERCENTILE_DISC` would return an actual value from the dataset.

#### Step 2: Group by `method`

Each group gets its own percentile calculation.

#### Step 3: Order by the percentile value

Sort descending to highlight the groups with the highest threshold values.

---

### The solution

**Percentile for 80th percentile api latency**

```sql
SELECT
    method,
    PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY latency) AS p80
FROM api_calls
GROUP BY method
ORDER BY p80 DESC
```

> **Cost Analysis**
>
> The main table has 2.0B rows (512 GB). Partitioned on `call_time`, so queries filtering on that column skip most partitions. The window function runs after grouping, so it operates on the reduced result set.

> **Interviewers Watch For**
>
> Interviewers check whether the candidate chooses the right window function variant and can articulate why.

> **Common Pitfall**
>
> Returning extra columns not asked for, or missing a required column, are both common mistakes that fail automated grading.

---

## Common follow-up questions

- The `latency` column in `api_calls` has roughly 0% NULLs. How does your query handle those rows, and would the result change if NULLs were replaced with zeros? _(Tests whether the candidate understands how NULLs propagate through aggregation functions and whether their WHERE/JOIN conditions implicitly filter them out.)_
- If the dataset has only 3 rows in a group, how does the percentile interpolation behave, and is that still meaningful? _(Tests understanding of small-sample percentile behavior and statistical validity.)_
- `call_id` in `api_calls` has ~2000M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- Could you express this same logic as a single query without CTEs or subqueries? What readability trade-off does that introduce? _(Tests whether the candidate can flatten nested logic and understands when decomposition aids maintainability.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/80th_percentile_api_latency)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.