# Rarest Latency Value

> A latency value that appeared exactly once.

Canonical URL: <https://datadriven.io/problems/rarest_latency_value>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

An SRE flagged a latency outlier. Find the highest latency value that appears at most twice across all API calls. First identify latency values that occur once or twice, then find the maximum among those.

## Worked solution and explanation

### Why this problem exists in real interviews

Drawn from a reliability engineering domain, this question centers on HAVING for post-aggregation filtering over the `api_calls` table. The tricky part is handling the `latency` column correctly under the given constraints.

---

### Break down the requirements

#### Step 1: Find latency values appearing exactly once

`GROUP BY latency HAVING COUNT(*) = 1` isolates unique-occurrence values.

#### Step 2: Select the maximum

`SELECT MAX(latency)` from the filtered set returns the highest single-occurrence latency.

---

### The solution

**Having filter for rarest latency value**

```sql
SELECT MAX(latency) AS rarest_highest
FROM (
    SELECT latency
    FROM api_calls
    GROUP BY latency
    HAVING COUNT(*) = 1
) unique_vals
```

> **Cost Analysis**
>
> At `api_calls` (500,000,000 rows), a full table scan is expensive. Partition pruning (if the table is partitioned on the filter column) is the first optimization. A covering index on the `GROUP BY` + filter columns eliminates random I/O. Consider a materialized view for repeated dashboard queries.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If `err_msg` in `api_calls` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `api_calls.err_msg`.)_
- If `api_calls` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `call_time`? _(Tests ability to identify performance hotspots related to `api_calls.call_time` at scale.)_
- If the HAVING threshold in your query changed from a fixed number to a percentile, how would you restructure the query? _(Tests ability to replace static HAVING filters with dynamic subquery-based thresholds.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/rarest_latency_value)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.