# Recurring Error Types

> The same errors, recurring.

Canonical URL: <https://datadriven.io/problems/recurring_error_types>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

After the latest incident, the on-call SRE needs to identify recurring error types. Find all error types that appear more than once. Show just the error type.

## Worked solution and explanation

### Why this problem exists in real interviews

Built around the `err_tracks` table, this challenge probes your ability to apply HAVING for post-aggregation filtering in a reliability engineering setting. Correctly referencing the `err_type` column is essential to a working solution.

---

### Break down the requirements

#### Step 1: Aggregate by `err_type`

`GROUP BY err_type` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 2: Filter groups with HAVING

HAVING applies after GROUP BY, filtering out groups that do not meet the threshold. This cannot be done in WHERE because the aggregate has not been computed yet.

---

### The solution

**Having filter for recurring error types**

```sql
SELECT err_type
FROM err_tracks
GROUP BY err_type
HAVING COUNT(*) > 1
```

> **Cost Analysis**
>
> With ~20M rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you use HAVING (not WHERE) to filter after aggregation.

> **Common Pitfall**
>
> Putting the aggregate condition in WHERE instead of HAVING causes a syntax error. WHERE runs before GROUP BY; HAVING runs after.

---

## Common follow-up questions

- What would happen to your result if `err_tracks.svc_name` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `svc_name` and uses DISTINCT or deduplication where needed.)_
- With 4,000,000 distinct values in `err_tracks.message`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `message` affects grouping and sort operations.)_
- If the HAVING threshold in your query changed from a fixed number to a percentile, how would you restructure the query? _(Tests ability to replace static HAVING filters with dynamic subquery-based thresholds.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/recurring_error_types)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.