# Not All Fires Are Equal

> The alert volume varies. So does what it means.

Canonical URL: <https://datadriven.io/problems/avg_alerts_by_severity>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The incident response team only trusts alert data from services that also appear in the health check system. Among those services, for any service-severity combination that fired more than once in 2026, first count the alerts per combination, then compute the average of those counts for each severity level, rounded to one decimal place.

## Worked solution and explanation

### Why this problem exists in real interviews

On-call telemetry questions like this probe whether you can hold two aggregation levels in your head without collapsing them. The interviewer is checking that you build per-(svc, severity) counts first, then average across services per severity, rather than producing a single number that looks reasonable but answers a different question.

---

### Break down the requirements

#### Step 1: Filter to the target year

Inside the subquery, restrict `alert_events` with `strftime('%Y', fired_at) = '2026'`. Do this before any aggregation so the per-combo counts only reflect that year.

#### Step 2: Count per (service, severity)

`GROUP BY svc_name, severity` with `COUNT(*) AS alert_count`. This materializes the per-combination numbers that you will average. Apply `HAVING COUNT(*) > 1` to drop one-off pairs.

#### Step 3: Restrict to known services

`INNER JOIN svc_health sh ON ae.svc_name = sh.svc_name` enforces the membership rule. Inner join in the outer scope keeps the join cheap because the subquery has already collapsed alerts to one row per combo.

#### Step 4: Average and format

Outer `GROUP BY ae.severity` with `CAST(ROUND(AVG(alert_count), 1) AS REAL)`. ROUND for display, CAST so engines that flatten whole-number averages to INT still return a float.

---

### The solution

**TWO-LEVEL AGGREGATION WITH MEMBERSHIP JOIN**

```sql
SELECT
  ae.severity,
  CAST(ROUND(AVG(alert_count), 1) AS REAL) AS avg_alert_count
FROM (
  SELECT
    svc_name,
    severity,
    COUNT(*) AS alert_count
  FROM alert_events
  WHERE strftime('%Y', fired_at) = '2026'
  GROUP BY svc_name, severity
  HAVING COUNT(*) > 1
) ae
INNER JOIN svc_health sh
  ON ae.svc_name = sh.svc_name
GROUP BY ae.severity
```

> **Cost Analysis**
>
> `alert_events` is 20M rows partitioned by `fired_at`, so the year predicate should prune to one partition. The inner `GROUP BY` collapses to roughly (n_services * n_severities) rows, typically a few thousand. The join into `svc_health` (30M rows) is the risk: without an index on `svc_health.svc_name` you get a hash join over 30M tuples. Pre-aggregating or using `EXISTS` against a distinct-services CTE both bound that blow-up.

> **Interviewers Watch For**
>
> Whether you ask if `svc_health` has duplicate `svc_name` rows. With 30M rows it almost certainly does (one row per health check). A naive inner join then multiplies each per-combo count by the number of health checks for that service, inflating the average. The fix is `INNER JOIN (SELECT DISTINCT svc_name FROM svc_health)` or `WHERE EXISTS`.

> **Common Pitfall**
>
> Writing `SELECT severity, COUNT(*) * 1.0 / COUNT(DISTINCT svc_name) FROM alert_events GROUP BY severity` looks clever and is wrong. Services that never fired a critical alert do not appear in the numerator OR denominator, so the average overstates. The two-level shape forces every (svc, severity) pair that exists to count explicitly, which is what the prompt asks for.

**Shortcut (wrong on sparse combos)**

`SELECT severity, COUNT(*) / COUNT(DISTINCT svc_name) FROM alert_events GROUP BY severity`. Denominator only counts services that fired at least one alert of that severity. A service with zero criticals is invisible, so the average is biased high. Also cannot apply the `HAVING COUNT(*) > 1` per-combo filter at all.

**Subquery then AVG (correct)**

Inner `GROUP BY svc_name, severity` materializes the per-entity counts explicitly. Outer `AVG(alert_count)` then averages exactly the population the prompt describes. The `HAVING` clause sits naturally at the inner level where it belongs.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would the result change if you used `LEFT JOIN` instead of `INNER JOIN` against `svc_health`? _(Tests whether you understand that the prompt's membership filter requires INNER; LEFT would re-include services with no health check via NULL rows and the AVG would be identical only by accident.)_
- Rewrite this so it works on a billion-row `alert_events` partitioned by date. _(Probes partition pruning awareness, whether the candidate replaces `strftime` with a sargable `fired_at >= '2026-01-01' AND fired_at < '{{NEXT_YEAR}}-01-01'` so the planner can actually skip partitions.)_
- How would you compute the median count per severity instead of the mean? _(Forces a switch to `PERCENTILE_CONT(0.5)` or a window-based approach over the same inner subquery. Reveals whether the candidate sees the inner result as a reusable building block.)_
- Why use `CAST(... AS REAL)` around `ROUND(AVG(...), 1)`? _(Checks understanding of engine-specific type promotion. In SQLite an AVG of all-integer counts can collapse to INTEGER when the mean lands on a whole number; the CAST guarantees a float for downstream consumers.)_
- If `svc_health.svc_name` has duplicates, how does that affect the answer and how do you fix it? _(Surfaces join-cardinality reasoning. The fix is `DISTINCT` in a derived table or `EXISTS`; the buggy version multiplies counts before averaging.)_

> **The one-liner**
>
> **Average of counts = two GROUP BYs.** Inner one defines the population you are averaging over; outer one collapses it. Trying to do both in a single aggregate is the bug.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/avg_alerts_by_severity)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.