# The Quiet Alarms

> Low severity. High volume. Worth a look.

Canonical URL: <https://datadriven.io/problems/the_quiet_alarms>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Before tightening severity thresholds, the data quality team needs to know how much volume the 'low' severity tier accounted for in 2026.

## Worked solution and explanation

### Why this problem exists in real interviews

Compliance and observability teams routinely ask for a single-number scoped count: 'how many X events of type Y happened in year Z'. Interviewers use this to check three things: do you reach for `COUNT(*)` instead of `SUM(1)`, do you extract the year correctly from a TEXT timestamp without breaking on string comparison, and do you use exact string equality for the severity rather than a sloppy LIKE that catches 'lowest' or 'slow'.

---

### Break down the requirements

#### Step 1: Filter severity with exact equality

`severity = 'low'` (exact match). LIKE '%low%' would also match values like 'lowest' or 'flowing' if the schema ever drifted. Exact equality is the safer interview answer and matches the prompt's wording 'severity low'.

#### Step 2: Extract year from the TEXT timestamp

`run_at` is TEXT, not a real date type. Use `strftime('%Y', run_at) = '2026'` to pull the year out and compare to the literal string '2026'. Comparing TEXT year to TEXT year avoids implicit conversion issues that hit when one side is INTEGER.

#### Step 3: Count and project a single column

`COUNT(*)` over the filtered rows produces the single integer count the prompt asks for. Aliasing the result `AS low_severity_count` makes the column self-documenting.

---

### The solution

**Filter, then count**

```sql
SELECT COUNT(*) AS low_severity_count FROM dq_checks WHERE severity = 'low' AND strftime('%Y', run_at) = '2026'
```

> **Cost Analysis**
>
> `dq_checks` holds 600,000 rows totalling roughly 154 MB. With no index, this is a single sequential scan that applies both predicates inline. The `strftime` call is per-row but cheap. An index on `(severity, run_at)` would let SQLite skip non-'low' rows entirely, which is a big win because severity has only 3 distinct values and 'low' is the single_hot value.

> **Interviewers Watch For**
>
> Interviewers watch whether you handle the 3% NULL fraction in `severity` correctly (NULL = 'low' is UNKNOWN, so NULL rows are silently excluded, which is the desired behavior here). They also flag candidates who write `EXTRACT(YEAR FROM run_at)` (Postgres-only) when the engine is SQLite, or who wrap the count in DISTINCT for no reason.

> **Common Pitfall**
>
> Comparing `strftime('%Y', run_at) = 2026` (integer literal) instead of `= '2026'` (string literal). SQLite implicitly coerces, but other engines like Postgres will throw a type error. Sticking to TEXT-on-both-sides keeps the query portable.

---

## Common follow-up questions

- How would you rewrite the year filter so an index on `run_at` could be used, instead of the per-row `strftime` call? _(Tests range-predicate substitution. Replacing `strftime` with `run_at >= '2026-01-01' AND run_at < '2027-01-01'` is sargable and lets a B-tree index on `run_at` do range elimination, while `strftime(run_at)` is opaque to the planner.)_
- How would you adapt this to return a count for every severity in 2026, not just 'low'? _(Tests grouping. `GROUP BY severity` returns one row per severity. The candidate should also include `WHERE severity IS NOT NULL` if the 3% NULL fraction should be excluded from the breakdown.)_
- How would you extend this to also report how many of those 'low' severity checks failed (`passed = 0`)? _(Tests correlated counting and `passed`. `SUM(CASE WHEN passed = 0 THEN 1 ELSE 0 END)` over the filtered rows isolates the failing 'low'-severity checks.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_quiet_alarms)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.