# CDN Traffic by Day and Hour

> CDN traffic, hour by hour.

Canonical URL: <https://datadriven.io/problems/cdn_traffic_by_day_and_hour>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The infra team needs to plan CDN capacity by time of day. Calculate average net bytes per request (gross bytes minus 5% overhead) broken down by day of the week (as text, e.g. 'Monday') and hour. Round to 2 decimal places. Return the weekday name, hour, and average net bytes.

## Worked solution and explanation

### Why this problem exists in real interviews

CDN capacity planning lives or dies on weekday-by-hour heatmaps. Interviewers use this prompt to see whether you can extract weekday and hour from a TEXT timestamp in SQLite (no DATE_PART, no TO_CHAR), translate the numeric weekday into its English name with a CASE, and apply a percentage adjustment to bytes inside the average.

---

### Break down the requirements

#### Step 1: Map numeric weekday to its name

strftime('%w', served_at) returns '0' for Sunday through '6' for Saturday as text. SQLite has no day-of-week name function, so the canonical pattern is a CASE expression with seven branches that returns the English name.

#### Step 2: Extract hour and apply the 5 percent overhead deduction

CAST(strftime('%H', served_at) AS INTEGER) gives an integer hour 0 to 23. For net bytes, multiply bytes by 0.95 inside AVG so the average reflects net throughput per request, not gross.

#### Step 3: Group by raw weekday number and hour, then order by them

GROUP BY strftime('%w', served_at), strftime('%H', served_at) keeps the grouping cheap (no CASE in the GROUP BY). Order by the raw '%w' so days come out Sunday through Saturday, then by hour ascending.

---

### The solution

**Weekday CASE plus hour extraction with overhead-adjusted AVG**

```sql
SELECT
  CASE WHEN strftime('%w', served_at) = '0' THEN 'Sunday'
       WHEN strftime('%w', served_at) = '1' THEN 'Monday'
       WHEN strftime('%w', served_at) = '2' THEN 'Tuesday'
       WHEN strftime('%w', served_at) = '3' THEN 'Wednesday'
       WHEN strftime('%w', served_at) = '4' THEN 'Thursday'
       WHEN strftime('%w', served_at) = '5' THEN 'Friday'
       WHEN strftime('%w', served_at) = '6' THEN 'Saturday'
  END AS weekday,
  CAST(strftime('%H', served_at) AS INTEGER) AS hour,
  ROUND(CAST(AVG(bytes * 0.95) AS DOUBLE), 2) AS avg_net_bytes
FROM cdn_logs
GROUP BY strftime('%w', served_at), strftime('%H', served_at)
ORDER BY strftime('%w', served_at), hour
```

> **Cost Analysis**
>
> cdn_logs is 1,000,000,000 rows. This is the table where partitioning matters: at a billion rows, a full scan to compute strftime per row is the entire cost. Production would partition by served_at (day) and pre-compute hour and weekday as columns. The aggregate produces only 7 * 24 = 168 result rows, so the GROUP BY state is trivial; the bottleneck is the scan and the per-row strftime calls.

> **Interviewers Watch For**
>
> They want strftime and a CASE for the weekday name (not Postgres TO_CHAR(served_at, 'Day')), they want the 0.95 multiplier inside AVG so it is applied per row before averaging, and they want ORDER BY the raw '%w' so days come out in calendar order rather than alphabetical.

> **Common Pitfall**
>
> Sorting by the weekday name puts Friday first and Wednesday last, which is alphabetical garbage for a heatmap. Computing AVG(bytes) * 0.95 gives the same number here because of linearity of expectation, but the moment you change the formula (say, deducting a fixed overhead per request) the order of operations stops being equivalent. Forgetting CAST AS INTEGER on the hour leaves it as text, and downstream tooling may sort '10' before '2'.

---

## Common follow-up questions

- How would you adapt this to compute peak hour per weekday instead of every hour? _(Forces a window function: ROW_NUMBER OVER (PARTITION BY weekday ORDER BY avg_net_bytes DESC) and filter to rn = 1. The candidate should mention this would shrink the result from 168 rows to 7.)_
- How would you handle a billion-row scan in production? _(Partition cdn_logs by served_at day or week, store a precomputed weekday/hour column, and run the aggregation as a daily rollup so this query reads from a small summary table instead of the raw logs.)_
- What if the overhead were 5 percent of bytes plus 200 fixed bytes per request? _(Now the formula is non-linear in the average. The candidate should compute net bytes per row (bytes * 0.95 - 200) and then AVG that expression, and explicitly call out why pulling constants outside the aggregate no longer works.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/cdn_traffic_by_day_and_hour)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.