# Largest CDN Response

> One edge location served something massive.

Canonical URL: <https://datadriven.io/problems/largest_cdn_response>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

The CDN team wants to know which edge location served the largest single response in 2026. For that response, show the edge location, the size in numeric bytes (the bytes column may have a trailing 'KB' suffix that needs to be stripped), and the request path.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a deliberate probe of date arithmetic and temporal grouping. It appears in at L4+ rounds because it separates candidates who memorize syntax from those who can reason about data transformations under pressure.

> **Trick to Solving**
>
> Read the prompt carefully for implicit constraints. The phrase structure hints at the grain of the output: what each row represents.
> 
> 1. Identify the output grain from the prompt (one row per what?)
> 2. Work backward from the desired output columns
> 3. Build the query inside-out: innermost subquery first, then layer on filters and aggregates

---

### Break down the requirements

#### Step 1: Read from `cdn_logs`

The query targets `cdn_logs` with 7 columns. Identify which columns are needed for the output.

#### Step 2: Order and limit the output

Sort by the target metric and apply `LIMIT` to return the requested number of rows. Ensure the sort is deterministic to produce reproducible results.

#### Step 3: Return the result set

Select the required columns with any necessary aliasing or formatting.

---

### The solution

**Full-table scan with ORDER BY DESC LIMIT 1**

```sql
SELECT log_id, edge_loc, req_path, status, bytes, cache_hit, served_at
FROM cdn_logs
ORDER BY bytes DESC
LIMIT 1
```

> **Cost Analysis**
>
> The query scans 1.0B rows from `cdn_logs`.

> **Interviewers Watch For**
>
> Candidates who verbalize their approach before typing, naming the output columns and expected row count, consistently perform better.

> **Common Pitfall**
>
> Comparing dates stored as TEXT without casting can produce lexicographic instead of chronological ordering. Always confirm the column type.

---

## Common follow-up questions

- How would you implement this if the data was sharded across multiple databases? _(Tests understanding of distributed query patterns: scatter-gather and merge.)_
- How would you make this query incremental instead of full-refresh? _(Tests knowledge of watermarks, CDC, or delta processing.)_
- If this powered a dashboard refreshing every 5 minutes, how would you optimize? _(Tests ability to design for latency: materialized views, pre-aggregation, caching.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/largest_cdn_response)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.