# Top 2 Busiest API Slots

> Two time slots per week. The busiest.

Canonical URL: <https://datadriven.io/problems/top_2_busiest_api_slots>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Segment each day's API traffic into 'Morning' (before 12:00), 'Early Afternoon' (12:00 to 15:00), and 'Late Afternoon' (after 15:00). Find the top 2 day-plus-time-segment combinations by total API calls; if there's a tie, include all tied results. Exclude records with missing timestamps.

## Worked solution and explanation

### The mental model

This is **bucketize-then-rank-with-ties**. Two distinct skills wired in series. First, project every `api_calls` row into a low-cardinality bucket: `(day_of_week, time_segment)`. Seven days times three segments caps the universe at 21 buckets, regardless of whether the table holds 250M rows or 2.5B. Second, rank those 21 buckets and slice the top tiers.

The phrase "if there"s a tie, include all tied results" is the load-bearing clue. It tells you which ranking function to reach for. `DENSE_RANK` assigns the same rank to tied buckets and does not skip the next rank, so filtering `rnk <= 2` returns every row inside the first two tiers, possibly more than two output rows. `ROW_NUMBER` would break ties arbitrarily and emit exactly two rows, silently dropping a tied bucket and failing the prompt.

---

### The three traps

#### Step 1: Off-by-one on segment boundaries

The middle segment, "12:00 to 15:00", spans hours 12-15 inclusive. The CASE must read `<= 15`, not `< 15`. Write `< 15` and hour 15 falls through to the ELSE, lands in "Late Afternoon", and shifts roughly one twelfth of afternoon traffic into the wrong bucket. The Morning boundary is easy (`< 12` correctly excludes hour 12); the Early Afternoon upper bound is where people slip.

#### Step 2: DENSE_RANK vs RANK vs ROW_NUMBER

All three express "top 2" but answer different questions.

- `ROW_NUMBER`: exactly two rows. Ties broken arbitrarily. Wrong here.
- `RANK`: a three-way tie for first jumps straight to rank 4, so `rnk <= 2` returns only the tied first-place rows and no second place.
- `DENSE_RANK`: includes all ties without skipping. `rnk <= 2` reliably means "everything in the top two tiers".

"Include all tied results" is DENSE_RANK semantics.

#### Step 3: `WHERE call_time IS NOT NULL` is not redundant

`strftime('%w', NULL)` returns NULL, and `GROUP BY` collapses all NULLs into a single group. Skip the WHERE clause and you get a phantom `(NULL, NULL)` bucket in your output that does not correspond to any real time slot, and worse, it can rank into your top two if NULL volume is heavy enough. Filter explicitly at the source rather than trusting GROUP BY to make the right call about NULL.

---

### The solution

**Bucket, dense-rank, slice**

```sql
WITH segmented AS (
  SELECT
    CAST(strftime('%w', call_time) AS INTEGER) AS day_of_week,
    CASE
      WHEN CAST(strftime('%H', call_time) AS INTEGER) < 12 THEN 'Morning'
      WHEN CAST(strftime('%H', call_time) AS INTEGER) <= 15 THEN 'Early Afternoon'
      ELSE 'Late Afternoon'
    END AS time_segment,
    COUNT(*) AS call_count
  FROM api_calls
  WHERE call_time IS NOT NULL
  GROUP BY day_of_week, time_segment
),
ranked AS (
  SELECT *, DENSE_RANK() OVER (ORDER BY call_count DESC) AS rnk
  FROM segmented
)
SELECT day_of_week, time_segment, call_count
FROM ranked
WHERE rnk <= 2
```

> **Cost shape on 250M rows**
>
> Single full scan of `api_calls` (partition-pruned on `call_time` only if you add a date range), hash aggregation to at most 21 groups, then a window over 21 rows. The window step is free. Bottleneck is the scan plus two `strftime` calls per row, a quarter-billion times. At hourly cadence, materialize `(day_of_week INT, hour_of_day INT)` as generated columns and index them; the scan becomes index-only.

> **What a senior candidate raises out loud**
>
> Three clarifications worth raising:
> 
> 1. **Day-of-week numbering.** `strftime('%w')` returns 0=Sunday in SQLite. Other engines disagree (MySQL `DAYOFWEEK` is 1=Sunday). Confirm before writing.
> 2. **Timezone.** Are segments defined in server UTC or each caller's local time? Morning in Tokyo is night in NYC. Flag it.
> 3. **Tie semantics.** Restate "include all tied results" as "DENSE_RANK, not ROW_NUMBER" so the interviewer knows you read the prompt.

> **ROW_NUMBER quietly drops tied buckets**
>
> Replace `DENSE_RANK` with `ROW_NUMBER` and the query still runs, still returns two rows, and still looks reasonable. But if three buckets tie for first, ROW_NUMBER picks two of them by engine-dependent tiebreaker and silently discards the third. The grader rejects, and you cannot reproduce the bug locally if your sample data happens to have no ties. The "include all tied results" line in the prompt is the only signal that catches this in advance.

> **When the CASE ladder stops scaling**
>
> Three segments fit a readable if-ladder. For finer-grained bucketing (quarter-hour slots, ten-minute slots) the ladder becomes a wall of CASE arms. Switch to arithmetic: `CAST(strftime('%H', call_time) AS INTEGER) * 4 + CAST(strftime('%M', call_time) AS INTEGER) / 15` produces a 0-95 quarter-hour bucket id with one expression. Trade off readability against terseness; for three buckets the ladder wins.

---

## Common follow-up questions

- If you swapped DENSE_RANK for ROW_NUMBER, on what input would the output differ? _(Probes whether the candidate can construct a tie case. Answer: any input where two or more buckets have identical `call_count` near the top. ROW_NUMBER returns two rows; DENSE_RANK returns however many are tied.)_
- Hour 15 falls into which segment per the prompt, and what does the expected query actually do? _(Forces the candidate to re-read the boundaries. Answer: ''12:00 to 15:00'' inclusive puts hour 15 in Early Afternoon, and the `<= 15` check confirms it. A `< 15` check would be wrong.)_
- Rewrite this for the top 3 buckets per day-of-week, so each day gets its own leaderboard. _(Tests partitioned window functions. Answer: `DENSE_RANK() OVER (PARTITION BY day_of_week ORDER BY call_count DESC)` and filter `rnk <= 3`. Returns up to 21 rows instead of a global top 2.)_
- Across all days combined, which single time segment is busiest? How does the query change? _(Tests that the candidate can collapse a dimension. Answer: drop `day_of_week` from the GROUP BY and the SELECT, leaving three rows; order by `call_count DESC` and LIMIT 1.)_
- If 5% of call_time values are NULL, where do those rows end up if you remove the WHERE clause? _(Tests NULL semantics in GROUP BY. Answer: they collapse into a single `(NULL, 'Late Afternoon')` group, because the CASE ELSE branch catches NULL after the two numeric comparisons return UNKNOWN. That bucket can rank into the top 2 if NULL volume is heavy enough.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_2_busiest_api_slots)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.