# Log Levels

> Severity breakdown with response times.

Canonical URL: <https://datadriven.io/problems/log_levels>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The SRE team is building a performance digest for the weekly incident review. For each log severity level, they want to know the number of entries and the average response time in milliseconds. Only include severity levels that have generated at least five log entries, and list them from the slowest average response time to the fastest.

## Worked solution and explanation

### Why this problem exists in real interviews

Working against server_logs, this problem tests query construction on the server_name and log_level columns. Interviewers use it as a fundamentals check because a subtle mis-grouping or filter placement changes the output without raising an error.

---

### Break down the requirements

#### Step 1: Read from `server_logs`

The query targets `server_logs` with 6 columns. Identify which columns are needed for the output.

#### Step 2: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

#### Step 3: Return the result set

Select the required columns with any necessary aliasing or formatting.

---

### The solution

**DISTINCT for unique values**

```sql
SELECT DISTINCT log_level
FROM server_logs
ORDER BY log_level
```

> **Cost Analysis**
>
> The query scans 500M rows from `server_logs`. The aggregation reduces the row count before any downstream processing, which is the key performance lever.

> **Interviewers Watch For**
>
> Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax.

> **Common Pitfall**
>
> Comparing dates stored as TEXT without casting can produce lexicographic instead of chronological ordering. Always confirm the column type.

---

## Common follow-up questions

- What happens to your result if server_logs.response_time_ms contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on response_time_ms.)_
- How would you verify that your aggregation on server_logs.log_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in server_logs.log_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like log_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/log_levels)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.