# Pairwise Latency Maximum

> Every pair compared.

Canonical URL: <https://datadriven.io/problems/pairwise_latency_maximum>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Take every combination of latency values from api_calls (with replacement) and, for each pair, show both values and whichever is larger. Limit to the first 100 rows when both values are in ascending sequence. Return the two latency values and the maximum of the pair.

## Worked solution and explanation

### Why this problem exists in real interviews

Extracting insights from api_calls.endpoint grouped by method via self-join and top-N selection is the central task. It is used in mid-level screens to test whether you pick the right aggregation function and partition boundary on the first attempt.

---

### Break down the requirements

#### Step 1: Cross join for all pairs

`FROM api_calls a CROSS JOIN api_calls b` creates every combination. With 150M rows this is conceptual; the `LIMIT 100` makes it practical.

#### Step 2: Compute the maximum of each pair

`MAX(a.latency, b.latency)` or `CASE WHEN a.latency > b.latency THEN a.latency ELSE b.latency END` picks the larger value.

#### Step 3: Order and limit

`ORDER BY a.latency, b.latency LIMIT 100` restricts to the first 100 rows in ascending sequence.

---

### The solution

**CROSS JOIN with scalar MAX and LIMIT**

```sql
SELECT
    a.latency AS latency_1,
    b.latency AS latency_2,
    MAX(a.latency, b.latency) AS pair_max
FROM api_calls a
CROSS JOIN api_calls b
ORDER BY a.latency, b.latency
LIMIT 100
```

> **Cost Analysis**
>
> At `api_calls` (150,000,000 rows), a full table scan is expensive. Partition pruning (if the table is partitioned on the filter column) is the first optimization. A covering index on the `GROUP BY` + filter columns eliminates random I/O. Consider a materialized view for repeated dashboard queries.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What happens to your result if api_calls.err_msg contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on err_msg.)_
- How do you prevent api_calls from joining a row to itself in the self-join? _(Tests whether the candidate adds an inequality condition to exclude identity matches.)_
- With millions of distinct values in api_calls.call_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like call_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/pairwise_latency_maximum)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.