# Alert Severity

> When the alarms go off, who screams loudest?

Canonical URL: <https://datadriven.io/problems/alert_severity>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

The streaming infrastructure team is analyzing message ordering within each topic. For each message, show its offset, the previous message's offset within the same topic, its dense rank by offset within the topic, and its sequential row number within the topic. Only include messages that have a predecessor in the same topic.

## Worked solution and explanation

### Why this problem exists in real interviews

Stream platform engineers audit Kafka-style offset continuity to detect missing or out-of-order messages, and the canonical tool is `LAG` over a partition. Interviewers use this prompt to test three things at once: that you cast TEXT offsets to INTEGER for numeric ordering (string-sorting offsets gives '10' < '2'), that you can stack three different window functions (`LAG`, `DENSE_RANK`, `ROW_NUMBER`) over the same window spec, and that you know to drop the first row in each topic where `LAG` returns NULL.

---

### Break down the requirements

#### Step 1: Cast offset for numeric ordering

`offset` is TEXT, so an `ORDER BY offset` sorts lexically: '1', '10', '11', '2'. Use `CAST(offset AS INTEGER)` inside the OVER clause so the window orders rows by numeric offset. This applies identically to all three window functions.

#### Step 2: Stack three windows over the same partition

All three window functions share `PARTITION BY topic ORDER BY CAST(offset AS INTEGER)`. `LAG(offset)` reads the previous row's offset (still as TEXT, no cast needed in the projection), `DENSE_RANK()` gives a no-gap rank, and `ROW_NUMBER()` gives a sequential index. Defining them inline in the SELECT lets the engine compute all three in one window pass.

#### Step 3: Drop first rows per topic with NULL prev_offset

The first row in each `topic` partition has no predecessor, so `LAG` returns NULL. Wrap the SELECT in a subquery and filter `WHERE prev_offset IS NOT NULL` in the outer query, since you cannot reference a window function in a WHERE clause directly.

---

### The solution

**Three windows over (topic, numeric offset), then drop NULL predecessors**

```sql
SELECT topic, offset, prev_offset, msg_rank, msg_row_num FROM (
  SELECT topic, offset,
    LAG(offset) OVER (PARTITION BY topic ORDER BY CAST(offset AS INTEGER)) AS prev_offset,
    DENSE_RANK() OVER (PARTITION BY topic ORDER BY CAST(offset AS INTEGER)) AS msg_rank,
    ROW_NUMBER() OVER (PARTITION BY topic ORDER BY CAST(offset AS INTEGER)) AS msg_row_num
  FROM stream_msgs
)
WHERE prev_offset IS NOT NULL
```

> **Cost Analysis**
>
> `stream_msgs` is 800M rows (~102 GB) with 60 distinct topics, so each topic averages ~13M rows. The window pass requires sorting each topic partition by numeric offset, which is the dominant cost. Computing three window functions over the same window spec is essentially free (single sort, three derived columns). At this scale you would partition the source table by `topic` and bucket by offset range so each worker handles one topic's sort independently.

> **Interviewers Watch For**
>
> Interviewers test whether you notice the TEXT-to-INTEGER cast (a candidate who skips it gets the wrong order on multi-digit offsets), whether you can reuse one window spec for three functions, and whether you know that window functions cannot be filtered in WHERE without a wrapping subquery or CTE. Strong candidates also call out the difference between DENSE_RANK and ROW_NUMBER (DENSE_RANK gives ties the same number, ROW_NUMBER does not) when explaining why both are returned.

> **Common Pitfall**
>
> Forgetting the `CAST(offset AS INTEGER)` is the silent killer: SQLite happily sorts TEXT lexically, so '100' comes before '20' and your `prev_offset` references the wrong row. Another pitfall is trying `WHERE LAG(offset) OVER (...) IS NOT NULL` in the same SELECT, which fails because window functions are evaluated after WHERE. The fix is the wrapping subquery or a CTE.

---

## Common follow-up questions

- How would you change the query to flag gaps in the offset sequence (e.g., where `CAST(offset AS INTEGER) - CAST(prev_offset AS INTEGER) > 1`) so the team can hunt missing messages? _(Tests whether the candidate can extend LAG-based logic. Add a `CASE WHEN CAST(offset AS INTEGER) - CAST(prev_offset AS INTEGER) > 1 THEN 1 ELSE 0 END AS gap_flag` to the inner SELECT, or filter the outer WHERE to only those rows. This is the actual production use case for the LAG pattern.)_
- `offset` has roughly 1% NULLs. What happens to those rows under `CAST(offset AS INTEGER)` in the ORDER BY of the window, and would you exclude them upstream? _(Tests NULL-in-window behavior. NULL `offset` casts to NULL and sorts to one end (NULLs first or last depending on engine), which corrupts the LAG sequence for surrounding rows. Production answers usually pre-filter `WHERE offset IS NOT NULL` before the window pass.)_
- `topic` has 60 distinct values with zipf skew, meaning a few topics have most of the messages. How does that skew affect window-function parallelism, and what would you do about it? _(Tests data-skew awareness. A skewed PARTITION BY can leave one worker doing 80% of the sort while others sit idle. Mitigations include further bucketing the hot topics by offset range (sub-partitioning) or running the audit per-topic in separate jobs sized to the topic's row count.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/alert_severity)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.