# Alert Response Breakdown

> An on-call postmortem asks which services are bleeding alerts nobody acknowledges.

Canonical URL: <https://datadriven.io/problems/alert_response_breakdown>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The incident response team is running a quarterly postmortem on alert handling. For every service, they need total alerts fired, how many were critical severity, how many were high severity, how many were never acknowledged, and the average number of alerts per status category. Normalize severity to lowercase before bucketing. Present services with the most total alerts first.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests whether you reach for **conditional aggregation** on instinct. The wide output (`critical_count`, `high_count`, `unacked_count` side by side) is the tell. A junior writes three subqueries or a self-join; you collapse it into one pass over `alert_events` with `SUM(CASE WHEN ...)` per bucket. The `LOWER(severity)` requirement and the per-status average are the two places people slip.

---

### Break down the requirements

#### Step 1: One pass, group by service

`GROUP BY svc_name`. Everything else, totals, severity buckets, unacked, per-status average, comes from a single scan over `alert_events`. No subqueries needed.

#### Step 2: Normalize severity in the CASE

`LOWER(severity) = 'critical'` inside each `SUM(CASE WHEN ...)`. Cheap because the comparison happens row by row during the aggregate, not in a WHERE that would prune data.

#### Step 3: Unacked count via NULL

`SUM(CASE WHEN ack_by IS NULL THEN 1 ELSE 0 END)`. Do not use `COUNT(ack_by)` and subtract from total; that works but obscures intent in a wide pivot.

#### Step 4: Avg alerts per status

`COUNT(*) * 1.0 / COUNT(DISTINCT status)` inside `ROUND(..., 2)`. The `1.0 *` forces float division so MySQL / Postgres do not truncate. `COUNT(DISTINCT)` is the second pass the planner needs.

#### Step 5: Order by volume

`ORDER BY total_alerts DESC`. The aggregate alias is fine to reference here in every major engine.

---

### The solution

**CONDITIONAL-AGG PIVOT IN ONE PASS**

```sql
SELECT
  svc_name,
  COUNT(*) AS total_alerts,
  SUM(CASE WHEN LOWER(severity) = 'critical' THEN 1 ELSE 0 END) AS critical_count,
  SUM(CASE WHEN LOWER(severity) = 'high'     THEN 1 ELSE 0 END) AS high_count,
  SUM(CASE WHEN ack_by IS NULL              THEN 1 ELSE 0 END) AS unacked_count,
  ROUND(1.0 * COUNT(*) / COUNT(DISTINCT status), 2) AS avg_per_status
FROM alert_events
GROUP BY svc_name
ORDER BY total_alerts DESC;
```

> **Cost Analysis**
>
> 50M rows, partitioned by `fired_at`. Without a date filter you scan every partition. The aggregate is one shuffle keyed on `svc_name`, cheap if cardinality is in the hundreds. The hidden cost is `COUNT(DISTINCT status)`: the planner cannot fold it into the same hash as the SUMs and typically spills a second per-group structure. If `status` has 5 known values, replace with a CASE-based bucket count for a meaningful speedup at scale.

> **Interviewers Watch For**
>
> Do you ask about the date window before scanning 50M rows? A staff candidate scopes the postmortem to a quarter via `WHERE fired_at >= ...` so partition pruning kicks in. Also: do you notice `LOWER()` is needed because severity is not normalized at ingest, and flag that as a data-quality issue worth fixing upstream?

> **Common Pitfall**
>
> Writing `WHERE severity = 'critical'` at the query level. That filters the whole result set to critical rows, so `total_alerts` no longer means total. The conditional has to live **inside** the SUM, not in the WHERE. Same trap with case sensitivity: `severity = 'critical'` misses `'Critical'` and `'CRITICAL'` on Postgres and most engines, so you under-count silently.

**Pivot via conditional agg (idiomatic)**

One pass, output already wide. `SUM(CASE WHEN LOWER(severity)='critical' ...)` per bucket, grouped only by `svc_name`. Hands the BI tool a ready table.

**GROUP BY (svc_name, severity) then reshape**

Long format: one row per (service, severity). Correct, but you need a `PIVOT` (Snowflake/SQL Server) or a wrapping conditional agg anyway to widen it. Two-step for no win.

> **Key Insight**
>
> If `total_alerts != critical_count + high_count + (other severities)` after running, you have NULL or unknown severities in the data. The CASE branches drop NULLs into the ELSE 0, but COUNT(*) still counts them, so the mismatch is a free data-quality probe.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you extend this to a per-week breakdown without rewriting the aggregate? _(Probes whether you add `DATE_TRUNC('week', fired_at)` to the GROUP BY and understand that the conditional-agg shape scales by adding grouping keys, not by rewriting.)_
- What changes if severity values were normalized at ingest? _(Tests whether you push data hygiene upstream rather than paying `LOWER()` on every read of a 50M-row table.)_
- How would you compute the median time-to-acknowledge per service in the same pass? _(Forces you to introduce `PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ack_at - fired_at)` and discuss why it cannot fold into a plain SUM/CASE shape.)_
- Why might `COUNT(DISTINCT status)` mislead the avg metric? _(Checks whether you spot that a service with one observed status gets `total / 1`, inflating the average versus a service that happens to see all 5 statuses.)_
- Rewrite this for a columnar warehouse where `severity` is dictionary-encoded. _(Surfaces awareness that `LOWER()` defeats dictionary pushdown; the fix is to compare against the encoded set or normalize at load.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/alert_response_breakdown)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.