# Slow Failures

> SRE is hunting for the endpoints that fail slowly enough to burn timeouts.

Canonical URL: <https://datadriven.io/problems/slow_failures>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The SRE team is investigating timeout-related incidents and needs to find API calls that both failed and took an unusually long time. Pull the endpoint, HTTP status code, latency, and the error message for every call that returned a client or server error and had a latency above two hundred milliseconds. Show the slowest failures first.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply filtering and projection to the `api_calls` table, simulating a real reliability engineering workflow. Pay attention to columns like `endpoint`, `status`, and `latency` as they drive the aggregation and output.

---

### Break down the requirements

#### Step 1: Apply the range filter

The WHERE clause restricts rows to the target range. Applying this filter early reduces the volume flowing into downstream operations.

#### Step 2: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Apply the range filter to find slow failures**

```sql
SELECT endpoint, status, latency, err_msg
FROM api_calls
WHERE status >= 400 AND latency > 200
ORDER BY latency DESC
```

> **Cost Analysis**
>
> With ~50M rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- What result would you get if every value in `api_calls.err_msg` were NULL? Would your query return an empty set or something unexpected? _(Tests extreme NULL scenarios and whether the candidate guards against edge cases in `err_msg`.)_
- `api_calls.call_time` has roughly 45,000,000 distinct values. What index strategy would you use to avoid a full scan on `api_calls`? _(Tests indexing knowledge specific to the high-cardinality `call_time` column in `api_calls`.)_
- `api_calls.method` only has 5 distinct values. If a new category were added, would your query automatically include it? _(Tests whether the query hard-codes values or dynamically adapts to `method` changes.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/slow_failures)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.