# Error Severity Buckets

> Errors sorted by how much they hurt.

Canonical URL: <https://datadriven.io/problems/error_severity_buckets>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

On-call wants every recorded error tagged with a human severity label based on how often it has fired: 0 occurrences is NONE, 1 to 5 is LOW, 6 to 20 is MODERATE, 21 to 50 is HIGH, anything above 50 is CRITICAL. Skip rows that aren't attributed to a service. Show each error type alongside its severity.

## Worked solution and explanation

### Why this problem exists in real interviews

Bucketing a numeric column into named labels is the most common ad-hoc reporting request in production. The interviewer is checking that you can write a per-row `CASE` with non-overlapping ranges, get the boundary values right, and remember the filter on `svc_name`.

---

### Break down the requirements

#### Step 1: Filter out unattributed errors

Drop rows where `svc_name IS NULL`. The output is one row per qualifying record in `err_tracks`.

#### Step 2: Define the five buckets

Map `count` to a label with `CASE`: `0` is `NONE`, `1-5` is `LOW`, `6-20` is `MODERATE`, `21-50` is `HIGH`, anything above 50 is `CRITICAL`. Each WHEN must be checked in order so a row only matches one branch.

#### Step 3: Project two columns

Return only `err_type` and the labeled column. Don't carry through `count`, `err_id`, or `svc_name`.

---

### The solution

**Per-row CASE with the documented buckets**

```sql
SELECT
    err_type,
    CASE
        WHEN count = 0 THEN 'NONE'
        WHEN count BETWEEN 1 AND 5 THEN 'LOW'
        WHEN count BETWEEN 6 AND 20 THEN 'MODERATE'
        WHEN count BETWEEN 21 AND 50 THEN 'HIGH'
        ELSE 'CRITICAL'
    END AS severity_label
FROM err_tracks
WHERE svc_name IS NOT NULL
```

> **Time and Space Complexity**
>
> **Time:** O(n) single scan over `err_tracks`. The CASE evaluates branches top-down, so a typical row stops at branch 2 or 3.
> 
> **Space:** O(1) extra; the projection has the same row count as the filtered input.

> **Interviewers Watch For**
>
> Strong candidates state the bucket boundaries back to the interviewer before writing SQL ("so 5 is LOW, 6 is MODERATE, right?"). They also remember `BETWEEN` is inclusive on both ends.

> **Common Pitfall**
>
> Putting the `>50 CRITICAL` branch before the bounded ones, or writing `count > 0 THEN 'LOW'` before `count = 0 THEN 'NONE'`. CASE evaluates in order, so the first matching branch wins and zero would be classified as LOW.

---

## Common follow-up questions

- How would you count how many errors fall in each bucket? _(Tests wrapping with GROUP BY on the CASE expression.)_
- What if count could be NULL? Which branch would it hit and how would you fix that? _(Tests handling NULL gracefully in CASE.)_
- How would you make the thresholds configurable without changing the query each time? _(Tests using a lookup table joined on a range instead of hardcoded thresholds.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/error_severity_buckets)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.