# Regions by Alert Volume

> Some regions are quiet. Others never stop screaming.

Canonical URL: <https://datadriven.io/problems/regions_by_alert_volume>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

We want an incident heatmap by region. Count the number of alerts per region (based on the service's region), sorted from most incidents to least.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests whether a candidate can demonstrate writing clean, correct queries under time pressure. This pattern appears frequently in mid-level SQL rounds where interviewers want to see structured thinking.

---

### Break down the requirements

#### Step 1: Join `alert_events` to `svc_health`

The join connects the two tables on their shared key. This brings the columns needed for filtering and aggregation into a single row set.

#### Step 2: Aggregate by `sh.region`

`GROUP BY sh.region` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 3: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Join and aggregate approach**

```sql
SELECT sh.region, COUNT(*) AS alert_count
FROM alert_events ae
INNER
JOIN svc_health sh ON ae.svc_name = sh.svc_name
GROUP BY sh.region
ORDER BY alert_count DESC
```

> **Cost Analysis**
>
> With ~50M rows, the GROUP BY reduces the working set before any downstream operations; the join cost depends on the smaller table's cardinality. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- What if the data volume grew 10x? _(Tests whether the candidate thinks about scan cost, indexing, and materialized views.)_
- What if the join key had duplicates on both sides? _(Tests awareness of fan-out: a many-to-many join inflates row counts unexpectedly.)_
- How would you schedule this as a daily report? _(Tests production mindset: incremental loads, idempotency, monitoring.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/regions_by_alert_volume)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.