# Streak Status Changes

> Detect value changes across consecutive rows

Canonical URL: <https://datadriven.io/problems/streak_status_changes>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

We monitor daily user status records for unauthorized state changes. Surface every row where the status changed from the previous day, with both the old and new status side by side so the team can audit the transitions.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests whether a candidate can demonstrate structuring multi-step logic with CTEs, using offset window functions to compare adjacent rows, and partitioning data correctly for per-group analysis. This is an advanced probe that separates senior candidates who can decompose complex logic from those who cannot.

> **Trick to Solving**
>
> Whenever the prompt asks you to compare a row to its predecessor or successor, that is a `LAG`/`LEAD` signal.
> 
> 1. Identify the comparison direction (previous vs. next)
> 2. Partition by the grouping key (e.g., user, service)
> 3. Order by the time or sequence column
> 4. Compute the difference in the outer query

---

### Break down the requirements

#### Step 1: Isolate the intermediate result in a CTE

The `with_prev` CTE computes the intermediate aggregation that the outer query builds on. This separation keeps each layer focused on a single task.

#### Step 2: Filter out null values

Exclude rows where `previous_status` is NULL. This prevents nulls from polluting aggregations or creating phantom groups.

---

### The solution

**Lag over svc_name partitions to surface day-over-day status flips**

```sql
WITH with_prev AS (
    SELECT svc_name, checked, status AS current_status, LAG(status) OVER (PARTITION BY svc_name ORDER BY checked) AS previous_status
    FROM svc_health
)
SELECT svc_name, checked, previous_status, current_status
FROM with_prev
WHERE previous_status IS NOT NULL AND previous_status != current_status
```

> **Cost Analysis**
>
> With ~40M rows, the window function runs on the reduced set after filtering and grouping; CTEs materialize intermediate results, which can be beneficial or costly depending on the engine. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you decompose the problem into named, testable stages rather than nesting everything; whether you reach for window functions or attempt a self-join for row-to-row comparison; how you handle NULL values and whether you account for them in filters and aggregations.

> **Common Pitfall**
>
> Forgetting to filter NULLs creates phantom groups or inflated counts. Always check `null_fraction` in the schema before assuming columns are clean.

---

## Common follow-up questions

- If svc_health has gaps in the checked date column for certain services, does the LAG approach still detect all transitions correctly? _(Tests understanding that LAG looks at row position, not calendar continuity; gaps can make non-adjacent days appear consecutive.)_
- How would you extend this to find streaks of three or more consecutive status changes for the same service? _(Tests ability to layer window functions or use a running-count technique on the flag column.)_
- What happens to the LAG value on the very first row per svc_name partition, and how does your query handle it? _(Tests awareness that LAG returns NULL for the first row and the need for COALESCE or a WHERE filter.)_
- If two health checks for the same service share the same checked timestamp, how would you define which one is 'previous'? _(Tests understanding of deterministic ordering; a tiebreaker column like check_id is needed in the ORDER BY.)_
- Could you rewrite this without window functions using a self-join, and what are the trade-offs? _(Tests flexibility in approach and understanding of self-join performance vs. window scans.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/streak_status_changes)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.