# Fastest and Slowest Services by Region

> The fastest and slowest in every region.

Canonical URL: <https://datadriven.io/problems/fastest_and_slowest_services_by_region>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

For each region, surface the highest and lowest latency services, excluding any region whose name contains 'test'. If services tie for the top or bottom spot, include all of them.

## Worked solution and explanation

### Why this problem exists in real interviews

SRE teams build dashboards that highlight extremes within partitions: the worst latency service per region, the best uptime AZ per provider. Interviewers like this prompt because it tests a single technique applied twice (top and bottom in one pass), filter placement when the filter is on a string, and whether you respect ties without writing a UNION.

---

### Break down the requirements

#### Step 1: Aggregate average latency per region and service

AVG(latency) GROUP BY region, svc_name lives in its own CTE. Filter LOWER(region) NOT LIKE '%test%' in WHERE so synthetic regions are dropped before aggregation. Drop NULL latencies in WHERE rather than HAVING because AVG already ignores NULLs but explicit filtering documents intent and shrinks the input.

#### Step 2: DENSE_RANK twice in the same SELECT

Compute high_rank with ORDER BY avg_latency DESC and low_rank with ORDER BY avg_latency ASC, both partitioned by region. Doing both in one window pass avoids a UNION ALL of two separate queries and lets you tag each row with its label in a CASE.

#### Step 3: Keep ties, label, and sort

WHERE high_rank = 1 OR low_rank = 1 keeps every service tied for the extreme in either direction. CASE WHEN high_rank = 1 THEN 'highest' ELSE 'lowest' END handles the rare service that is both (single service in a region); defaulting to 'lowest' is acceptable because the prompt does not require a special label for that edge case.

---

### The solution

**Aggregate, double rank, filter to extremes**

```sql
WITH svc_avg AS (
  SELECT region, svc_name, AVG(latency) AS avg_latency
  FROM svc_health
  WHERE LOWER(region) NOT LIKE '%test%' AND latency IS NOT NULL
  GROUP BY region, svc_name
),
ranked AS (
  SELECT region, svc_name, avg_latency,
    DENSE_RANK() OVER (PARTITION BY region ORDER BY avg_latency DESC) AS high_rank,
    DENSE_RANK() OVER (PARTITION BY region ORDER BY avg_latency ASC) AS low_rank
  FROM svc_avg
)
SELECT region, svc_name, avg_latency,
  CASE WHEN high_rank = 1 THEN 'highest' ELSE 'lowest' END AS latency_type
FROM ranked
WHERE high_rank = 1 OR low_rank = 1
ORDER BY region
```

> **Cost Analysis**
>
> svc_avg collapses 50M rows to roughly 8 regions times 200 services equals 1600 rows. The double window scan over that small intermediate is essentially free. Filtering the test regions before aggregation keeps the GROUP BY input small. The single ORDER BY region at the end avoids a per partition sort because the row count is tiny.

> **Interviewers Watch For**
>
> Whether you used DENSE_RANK to keep ties (the prompt explicitly says 'tied'), whether you filtered region in WHERE (not HAVING), and whether the case insensitive LIKE actually catches 'TEST' and 'Test'. Some candidates forget that LIKE is case sensitive in SQLite by default for non ASCII text, which is why LOWER on both sides is safer than a bare LIKE.

> **Common Pitfall**
>
> Using ROW_NUMBER instead of DENSE_RANK silently drops services that tie at the extreme. A region with three services all averaging 100ms latency should return all three as 'highest' and all three as 'lowest'; ROW_NUMBER returns only one of each.

---

## Common follow-up questions

- How would you label a region's only service, which is both highest and lowest? _(Tests edge case thinking. The CASE collapses to 'lowest' when both ranks are 1; an interviewer might want 'both' or two output rows. The candidate should surface the ambiguity.)_
- Why filter region in WHERE rather than after the aggregation in HAVING? _(Tests filter pushdown. region is a row level column not an aggregate, so WHERE is correct and shrinks the input to GROUP BY. HAVING would still work but does extra grouping work.)_
- What changes if 'latency' could be negative due to clock skew? _(Tests whether the candidate considers data quality. AVG happily averages negative values and could rank a buggy service as fastest; a HAVING MIN(latency) >= 0 clause would defend against this.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/fastest_and_slowest_services_by_region)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.