# Cheapest CDN Route

> The cheapest path across regions.

Canonical URL: <https://datadriven.io/problems/cheapest_cdn_route>

Domain: SQL · Difficulty: easy · Seniority: L5

## Problem

The CDN team wants the cheapest delivery from the SFO edge. Among successful (status 200) responses served from SFO, what is the smallest byte payload? Return a single value.

## Worked solution and explanation

### What this is really asking

`cdn_logs` is 800M rows partitioned by `served_at`, and the answer is a single scalar: `MIN(bytes)` filtered to SFO edge + status 200. The whole problem is recognizing that the word `route` is flavor text and the SQL is one line.

---

### Break down the requirements

#### Step 1: Filter the cohort

Two predicates on `cdn_logs`: `LOWER(edge_loc) = 'sfo'` defends against casing variants in free-text geo codes, and `status = 200` keeps only successful responses.

#### Step 2: Aggregate to one number

`MIN(bytes)` over the filtered set. No GROUP BY, no ORDER BY, no LIMIT. The prompt says return a single value.

---

### The solution

**SCALAR MIN WITH TWO FILTERS**

```sql
SELECT MIN(bytes) AS cheapest_cost
FROM cdn_logs
WHERE LOWER(edge_loc) = 'sfo'
  AND status = 200
```

> **Cost Analysis**
>
> 800M rows partitioned by `served_at`, but the query never filters on `served_at`, so every partition reads. `LOWER(edge_loc)` blocks any index on `edge_loc`. At this scale, propose adding a `served_at` predicate or materializing `edge_loc_lower` as a generated column.

> **Interviewers Watch For**
>
> Whether you ask about case in `edge_loc`. CDN edges are often free-text and dirty (`SFO`, `sfo`, `SFO-1`). Also whether you propose a `served_at` window before scanning 800M rows lifetime.

> **Common Pitfall**
>
> Reading `cheapest route` as a graph shortest-path problem and reaching for self-joins or recursive CTEs. Look at the schema: there is no source-destination structure, just per-request log rows. `route` is flavor; the SQL is one filter and one MIN.

> **The False Start**
>
> First instinct on hearing `cheapest route` is graph traversal: build edges, self-join hops, sum costs. That fits the words but not the schema. Pivot to scalar aggregation once you realize `cdn_logs` is a flat fact table without a graph. The prompt's verb is `is`, not `from-to`.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- Return the cheapest payload per edge location. _(Adds `GROUP BY LOWER(edge_loc)` and lifts the SFO filter; the scalar becomes a per-edge leaderboard.)_
- How would partition pruning help, and what predicate would unlock it? _(Tests whether you reach for a `served_at >= ...` filter and explain that without it every partition reads.)_
- What if `bytes` can be NULL for cache misses logged with no body? _(`MIN` skips NULLs, so the answer is still defined, but you might want `WHERE bytes IS NOT NULL` for clarity. Senior candidates flag the assumption.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/cheapest_cdn_route)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.