# Peak API Hour

> The hour when traffic peaks.

Canonical URL: <https://datadriven.io/problems/peak_api_hour>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

We need to identify the peak traffic hour of the day. First tally API calls per day and hour, then average across days. Which single hour has the highest average call volume? Return the hour and its average call count.

## Worked solution and explanation

### Why this problem exists in real interviews

The api_calls table contains endpoint and method values that must be processed with grouping and top-N selection. This appears in mid-level screens to probe whether you reason about the correct aggregation grain before writing any window or GROUP BY clause.

---

### Break down the requirements

#### Step 1: Count calls per day and hour

In a CTE, `GROUP BY DATE(call_time), strftime('%H', call_time)` tallies calls per day-hour combination.

#### Step 2: Average across days per hour

In the outer query, `GROUP BY hour` with `AVG(daily_count)` collapses multiple days into a single average per hour.

#### Step 3: Select the peak hour

`ORDER BY avg_count DESC LIMIT 1` returns the single highest-traffic hour.

---

### The solution

**Two-level aggregation: day-hour then hour average**

```sql
WITH hourly AS (
    SELECT
        DATE(call_time) AS call_date,
        CAST(strftime('%H', call_time) AS INTEGER) AS call_hour,
        COUNT(*) AS call_count
    FROM api_calls
    GROUP BY call_date, call_hour
)
SELECT call_hour, AVG(call_count) AS avg_count
FROM hourly
GROUP BY call_hour
ORDER BY avg_count DESC
LIMIT 1
```

> **Cost Analysis**
>
> At `api_calls` (300,000,000 rows), a full table scan is expensive. Partition pruning (if the table is partitioned on the filter column) is the first optimization. A covering index on the `GROUP BY` + filter columns eliminates random I/O. Consider a materialized view for repeated dashboard queries.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What happens to your result if api_calls.err_msg contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on err_msg.)_
- How would you verify that your aggregation on api_calls.call_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in api_calls.call_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like call_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/peak_api_hour)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.