# Rush Hour API Latency

> Rush hour hits the API differently.

Canonical URL: <https://datadriven.io/problems/rush_hour_api_latency>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

For capacity planning in the 'us-east' region, calculate the average latency per hour for API calls made between 15:00 and 17:59 inclusive. Return the hour and average latency.

## Worked solution and explanation

### Why this problem exists in real interviews

This latency monitoring problem uses the `api_calls` table to evaluate date extraction for time bucketing. Watch how the `call_time` column interact in the grouping and filtering logic.

---

### Break down the requirements

#### Step 1: Apply the range filter

The WHERE clause restricts rows to the target range. Applying this filter early reduces the volume flowing into downstream operations.

#### Step 2: Aggregate by `CAST(strftime('%H'`

`GROUP BY CAST(strftime('%H', call_time` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 3: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Apply the range filter to find rush hour api latency**

```sql
SELECT CAST(strftime('%H', call_time) AS INTEGER) AS hour, AVG(latency) AS avg_latency
FROM api_calls
WHERE CAST(strftime('%H', call_time) AS INTEGER) BETWEEN 15 AND 17
GROUP BY CAST(strftime('%H', call_time) AS INTEGER)
ORDER BY hour
```

> **Cost Analysis**
>
> With ~250M rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for how you handle date arithmetic and whether you account for edge cases like month boundaries.

> **Common Pitfall**
>
> Integer division truncates the result silently. Cast at least one operand to DOUBLE before dividing to get a decimal result.

---

## Common follow-up questions

- What result would you get if every value in `api_calls.err_msg` were NULL? Would your query return an empty set or something unexpected? _(Tests extreme NULL scenarios and whether the candidate guards against edge cases in `err_msg`.)_
- `api_calls.latency` has roughly 1,800,000 distinct values. What index strategy would you use to avoid a full scan on `api_calls`? _(Tests indexing knowledge specific to the high-cardinality `latency` column in `api_calls`.)_
- If the date column in `api_calls` spans multiple years, does your date extraction logic still produce correct time buckets? _(Tests whether the candidate accounts for year boundaries in date bucketing.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/rush_hour_api_latency)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.