# Latency vs Regional Average

> Each service versus its region's average.

Canonical URL: <https://datadriven.io/problems/latency_vs_regional_average>

Domain: SQL · Difficulty: easy · Seniority: L4

## Problem

Engineers want to compare each service's latency against the regional baseline. For each health check record, show the service name, region, latency, and the average latency across all services in the same region.

## Worked solution and explanation

### Why this problem exists in real interviews

Extracting insights from svc_health.svc_name grouped by status via averaging is the central task. It is used as a fundamentals check to test whether you pick the right aggregation function and partition boundary on the first attempt.

---

### Break down the requirements

#### Step 1: Apply the window function

Use the window function with the correct `PARTITION BY` and `ORDER BY` to compute the required metric per group.

#### Step 2: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**Window AVG for per-row deviation from group mean**

```sql
SELECT svc_name, region, latency,
    AVG(latency) OVER (PARTITION BY region) AS regional_avg,
    latency - AVG(latency) OVER (PARTITION BY region) AS diff_from_avg
FROM svc_health
ORDER BY region, diff_from_avg DESC
```

> **Cost Analysis**
>
> The query scans 25M rows from `svc_health`. The aggregation reduces the row count before any downstream processing, which is the key performance lever.

> **Interviewers Watch For**
>
> Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax. Walking through comparison logic step by step, rather than writing it in one pass, demonstrates structured thinking.

> **Common Pitfall**
>
> Returning more columns than the prompt asks for can trigger a "wrong schema" failure in automated grading. Match the output specification exactly.

---

## Common follow-up questions

- What happens to your result if svc_health.latency contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on latency.)_
- How would you verify that your aggregation on svc_health.check_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in svc_health.check_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like check_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/latency_vs_regional_average)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.