# Sequential Service Transitions

> Job to job. The transitions.

Canonical URL: <https://datadriven.io/problems/sequential_service_transitions>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

Among all users who deployed to 'auth-svc' at some point, how many had 'payments-svc' as their very next deployment (by timestamp) with no other services in between? Return a single count.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply LEAD for next-row lookahead to the `deploy_logs` table, simulating a real deployment workflow. Pay attention to columns like `svc_name`, `deploy_at`, and `author` as they drive the aggregation and output.

> **Trick to Solving**
>
> Whenever the prompt asks you to compare a row to its predecessor or successor, that is a `LAG`/`LEAD` signal.
> 
> 1. Identify the comparison direction (previous vs. next)
> 2. Partition by the grouping key (e.g., user, service)
> 3. Order by the time or sequence column
> 4. Compute the difference in the outer query

---

### Break down the requirements

#### Step 1: Isolate the intermediate result in a CTE

The `ordered_deploys` CTE computes the intermediate aggregation that the outer query builds on. This separation keeps each layer focused on a single task.

#### Step 2: Select the target columns

The SELECT clause picks exactly the columns the prompt asks for. Returning extra columns or missing a required alias would fail the grading check.

---

### The solution

**Lead-lookahead for sequential service transitions**

```sql
WITH ordered_deploys AS (
    SELECT author, svc_name, deploy_at, LEAD(svc_name) OVER (PARTITION BY author ORDER BY deploy_at) AS next_svc
    FROM deploy_logs
)
SELECT COUNT(DISTINCT author) AS transition_count
FROM ordered_deploys
WHERE svc_name = 'auth-svc' AND next_svc = 'payments-svc'
```

> **Cost Analysis**
>
> With ~1M rows, the window function runs on the reduced set after filtering and grouping; CTEs materialize intermediate results, which can be beneficial or costly depending on the engine. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you decompose the problem into named, testable stages rather than nesting everything; whether you reach for window functions or attempt a self-join for row-to-row comparison; whether you use a subquery or self-join, and can explain the tradeoffs.

> **Common Pitfall**
>
> Applying LAG/LEAD without the correct PARTITION BY groups unrelated rows together, producing meaningless comparisons across boundaries.

---

## Common follow-up questions

- The `dur_secs` column in `deploy_logs` has a 1% null rate. How does your query handle rows where `dur_secs` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `deploy_logs.dur_secs` and understands how aggregates skip NULL values.)_
- If `deploy_logs` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `log_id`? _(Tests ability to identify performance hotspots related to `deploy_logs.log_id` at scale.)_
- What is the default window frame for your window function, and would explicitly setting ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW change anything? _(Tests knowledge of implicit vs explicit window frame specifications.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/sequential_service_transitions)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.