# Non-Trivial Fatal Errors

> Short errors are noise. Long ones matter.

Canonical URL: <https://datadriven.io/problems/non_trivial_fatal_errors>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

We categorize error messages by length: 'short' (under 25 chars), 'mid' (25 to 35), and 'long' (over 35). Find errors that are not 'short' and have fatal severity (matching 'Fatal' or 'fatal' exactly). Show error ID, message, service name, and the length category, with one row per error.

## Worked solution and explanation

### Why this problem exists in real interviews

Extracting insights from err_tracks.err_type grouped by message via query construction is the central task. It is used in mid-level screens to test whether you pick the right aggregation function and partition boundary on the first attempt.

---

### Break down the requirements

#### Step 1: Filter to fatal severity

`WHERE severity IN ('Fatal', 'fatal')` restricts to exact case matches as specified.

#### Step 2: Compute length category

Use `CASE WHEN LENGTH(message) < 25 THEN 'short' WHEN LENGTH(message) <= 35 THEN 'mid' ELSE 'long' END` to classify each error message.

#### Step 3: Exclude short errors

Add `AND LENGTH(message) >= 25` to the `WHERE` clause, or wrap the `CASE` in a subquery and filter out 'short'.

#### Step 4: Return the required columns

Select `err_id`, `message`, `svc_name`, and the length category.

---

### The solution

**Length classification with severity filter**

```sql
SELECT
    err_id,
    message,
    svc_name,
    CASE
        WHEN LENGTH(message) <= 35 THEN 'mid'
        ELSE 'long'
    END AS length_category
FROM err_tracks
WHERE severity IN ('Fatal', 'fatal')
  AND LENGTH(message) >= 25
```

> **Cost Analysis**
>
> With `err_tracks` (25,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If err_tracks.err_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in err_id.)_
- How would you verify that your aggregation on err_tracks.err_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in err_tracks.err_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like err_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/non_trivial_fatal_errors)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.