# Response Buckets

> Fast, normal, or slow. Every API call gets a verdict.

Canonical URL: <https://datadriven.io/problems/response_buckets>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

The performance team wants API calls bucketed by latency. For each row in api_calls with a non-NULL latency, label it 'fast' when latency is under 100, 'normal' when latency is between 100 and 500 inclusive, and 'slow' when latency is above 500. Return the endpoint, latency, and that label.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply filtering and projection to the `api_calls` table, simulating a real latency monitoring workflow. Pay attention to the `latency` column as they drive the aggregation and output.

---

### Break down the requirements

#### Step 1: Filter out null values

Exclude rows where `latency` is NULL. This prevents nulls from polluting aggregations or creating phantom groups.

#### Step 2: Classify rows with a CASE expression

The `CASE` expression evaluates conditions top to bottom and returns the first match. Order matters: put the most restrictive condition first to avoid misclassification.

---

### The solution

**Filter out null values to find response buckets**

```sql
SELECT endpoint, latency, CASE WHEN latency < 100 THEN 'fast' WHEN latency <= 500 THEN 'normal' ELSE 'slow' END
FROM api_calls
WHERE latency IS NOT NULL
```

> **Cost Analysis**
>
> With ~200M rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for how you handle NULL values and whether you account for them in filters and aggregations.

> **Common Pitfall**
>
> Forgetting to filter NULLs creates phantom groups or inflated counts. Always check `null_fraction` in the schema before assuming columns are clean.

---

## Common follow-up questions

- The `err_msg` column in `api_calls` has a 93% null rate. How does your query handle rows where `err_msg` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `api_calls.err_msg` and understands how aggregates skip NULL values.)_
- `api_calls.call_id` has roughly 200,000,000 distinct values. What index strategy would you use to avoid a full scan on `api_calls`? _(Tests indexing knowledge specific to the high-cardinality `call_id` column in `api_calls`.)_
- Your conditional CASE logic assumes the categories are exhaustive. What happens if a row in `api_calls` falls into none of the branches? _(Tests awareness of the implicit ELSE NULL in CASE expressions.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/response_buckets)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.