# Biggest Deployment Decline

> One team's deploy count cratered. Which one?

Canonical URL: <https://datadriven.io/problems/biggest_deployment_decline>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

A downstream report flagged a drop in one service's deploy frequency. Find which service had the biggest month-over-month drop in deployments from March to April 2026. Only include services that also appear in svc_health. Return just the service with the largest decline, showing the service name and the size of the decline.

## Worked solution and explanation

### Why this problem exists in real interviews

Period-over-period deltas are a standard analyst output, and the interviewer is probing whether you instinctively reach for conditional aggregation over two CTEs. The other signal: do you understand that 'decline' is signed, so sorting by `ABS(delta) DESC` would surface growers as decliners? Mixing sign on a directional metric is a real production bug.

---

### Break down the requirements

#### Step 1: Narrow deploy_logs to the two-month window

`WHERE` on `deploy_at` covering March and April of the target year. This is the only filter that touches a 1M-row table; everything else operates on the post-filter set.

#### Step 2: Restrict to services that exist in svc_health

Join against `(SELECT DISTINCT svc_name FROM svc_health)`. The `DISTINCT` collapses 40M rows down to a few thousand service names before the join, so the planner can hash-probe instead of explode.

#### Step 3: Conditional aggregation per service

`GROUP BY svc_name` with two `SUM(CASE WHEN month=...)` expressions. March minus April gives signed `deploy_decline`. Missing months in a service contribute 0, no `COALESCE` needed.

#### Step 4: Sort signed, take the top

`ORDER BY deploy_decline DESC LIMIT 1`. Sort signed, not absolute, or a service that grew (negative decline) could win the wrong way.

---

### The solution

**CONDITIONAL AGGREGATION, ONE SCAN**

```sql
SELECT
  dl.svc_name,
  SUM(CASE WHEN strftime('%m', dl.deploy_at) = '03'
            AND strftime('%Y', dl.deploy_at) = '2026'
           THEN 1 ELSE 0 END)
  - SUM(CASE WHEN strftime('%m', dl.deploy_at) = '04'
              AND strftime('%Y', dl.deploy_at) = '2026'
             THEN 1 ELSE 0 END) AS deploy_decline
FROM deploy_logs dl
JOIN (SELECT DISTINCT svc_name FROM svc_health) sh
  ON dl.svc_name = sh.svc_name
WHERE strftime('%Y', dl.deploy_at) = '2026'
  AND strftime('%m', dl.deploy_at) IN ('03', '04')
GROUP BY dl.svc_name
ORDER BY deploy_decline DESC
LIMIT 1;
```

> **Cost Analysis**
>
> `deploy_logs` is 1M rows; the `WHERE` keeps roughly 1/6 (two months of a year), so the aggregation works on ~167k rows. The `svc_health` subquery collapses 40M rows to a few thousand distinct names, which fits in a hash and probes cheaply. The wart: `strftime('%m', deploy_at) IN ('03','04')` forces a full scan because the planner can't use any index on `deploy_at`. Rewrite as `deploy_at >= 'YYYY-03-01' AND deploy_at < 'YYYY-05-01'` to let a btree on `deploy_at` range-seek.

> **Interviewers Watch For**
>
> Do you ask whether services with zero April deploys (or zero March deploys) should appear? Conditional aggregation handles it silently (missing month contributes 0). The alternative, two CTEs joined on `svc_name`, needs `FULL OUTER JOIN` plus `COALESCE` or it silently drops services that didn't deploy in one of the months.

> **Common Pitfall**
>
> Writing `ORDER BY ABS(deploy_decline) DESC` to 'find the biggest change.' That surfaces the biggest grower as the biggest 'decline,' which is wrong. Decline is signed: March minus April, positive means dropped. Sort `DESC` on the raw value.

**Two CTEs (two scans)**

WITH mar AS (SELECT svc_name, COUNT(*) c FROM deploy_logs WHERE ... GROUP BY svc_name), apr AS (...) SELECT mar.svc_name, mar.c - apr.c ... FROM mar FULL OUTER JOIN apr USING(svc_name). Two passes over deploy_logs, needs COALESCE on both sides, and SQLite doesn't even support FULL OUTER JOIN pre-3.39.

**Conditional agg (one scan)**

One pass over the filtered set. Missing months contribute 0 to the SUM with no special handling. Subtract the two SUMs inline. Shorter, faster, and portable.

> **The Elegant Move**
>
> Treat the membership table as a `DISTINCT` subquery, not a raw join. Joining straight to `svc_health` multiplies every deploy row by however many health checks that service has (potentially thousands), inflating every `SUM` to garbage. `DISTINCT` first, then join.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- Return the top 5 decliners with their percentage drop, not just the count drop. _(Forces a second conditional `SUM` for the March baseline and a `NULLIF` to avoid divide-by-zero when March was empty.)_
- How would you rewrite this if `deploy_logs` were partitioned by `deploy_at`? _(Tests partition pruning awareness: the `strftime` form blocks pruning; half-open date ranges unlock it.)_
- What changes if a service must appear in svc_health in both March and April specifically? _(Pushes the membership predicate into a date-bounded `EXISTS` rather than a global `DISTINCT`.)_
- Compute month-over-month decline for every month in the year, not just March to April. _(Natural pivot to `LAG()` over a monthly aggregate CTE.)_
- How would the query change if `deploy_logs.status` had values like `failed` and `rolled_back` that shouldn't count? _(Probes whether you push the filter into the `CASE` or into the `WHERE`, and the cardinality tradeoff between them.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/biggest_deployment_decline)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.