# Server With Most Errors

> One server stands out. Not in a good way.

Canonical URL: <https://datadriven.io/problems/server_with_most_errors>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

During an incident, the on-call SRE needs to find the single server with the most logged errors. Show the server name and its error count.

## Worked solution and explanation

### Why this problem exists in real interviews

Drawn from a reliability engineering domain, this question centers on grouped COUNT aggregation over the `server_logs` table. The tricky part is handling the `server_name` column correctly under the given constraints.

---

### Break down the requirements

#### Step 1: Apply the WHERE filter

Filter rows before any aggregation. This ensures only qualifying data enters the computation, keeping the result correct and the scan minimal.

#### Step 2: Aggregate by `server_name`

`GROUP BY server_name` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 3: Order and limit the output

`ORDER BY` with `LIMIT` returns only the top result. The sort must be deterministic; add a tiebreaker column if needed.

---

### The solution

**Apply the where filter to find server with most errors**

```sql
SELECT server_name, COUNT(*) AS error_count
FROM server_logs
WHERE log_level = 'ERROR'
GROUP BY server_name
ORDER BY error_count DESC
LIMIT 1
```

> **Cost Analysis**
>
> With ~70M rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Using LIMIT without ORDER BY returns an arbitrary subset. Always pair LIMIT with a deterministic ORDER BY.

---

## Common follow-up questions

- What result would you get if every value in `server_logs.response_time_ms` were NULL? Would your query return an empty set or something unexpected? _(Tests extreme NULL scenarios and whether the candidate guards against edge cases in `response_time_ms`.)_
- `server_logs.log_timestamp` has roughly 31,536,000 distinct values. What index strategy would you use to avoid a full scan on `server_logs`? _(Tests indexing knowledge specific to the high-cardinality `log_timestamp` column in `server_logs`.)_
- If this query ran as a scheduled job, how would you add monitoring to detect when the result set is suspiciously empty? _(Tests operational awareness around scheduled query jobs.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/server_with_most_errors)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.