# Services With Most Error Occurrences

> The noisiest services.

Canonical URL: <https://datadriven.io/problems/services_with_most_error_occurrences>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

For each service in 2026, surface its peak error count, sorted from most to least.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this error tracking scenario to test date extraction for time bucketing against the `err_tracks` table. The focus is on how you handle the `svc_name` column when building the result.

---

### Break down the requirements

#### Step 1: Apply the WHERE filter

Filter rows before any aggregation. This ensures only qualifying data enters the computation, keeping the result correct and the scan minimal.

#### Step 2: Aggregate by `svc_name`

`GROUP BY svc_name` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 3: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Apply the where filter to find services with most error occurrences**

```sql
SELECT svc_name, MAX(count) AS max_error_count
FROM err_tracks
WHERE strftime('%Y', first_at) = '2026'
GROUP BY svc_name
ORDER BY max_error_count DESC
```

> **Cost Analysis**
>
> With ~20M rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for how you handle date arithmetic and whether you account for edge cases like month boundaries.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- What would happen to your result if `err_tracks.err_id` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `err_id` and uses DISTINCT or deduplication where needed.)_
- With 20,000,000 distinct values in `err_tracks.err_id`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `err_id` affects grouping and sort operations.)_
- If the date column in `err_tracks` spans multiple years, does your date extraction logic still produce correct time buckets? _(Tests whether the candidate accounts for year boundaries in date bucketing.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/services_with_most_error_occurrences)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.