# Month With Fewest Deploys

> One month, nobody deployed.

Canonical URL: <https://datadriven.io/problems/month_with_fewest_deploys>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The release team is choosing the safest month for a major migration and wants to pick the one with historically the fewest deployments across all years. Which month number had the lowest deploy count?

## Worked solution and explanation

### Why this problem exists in real interviews

The interviewer wants to see you apply grouping and top-N selection to deploy_logs.svc_name while accounting for the distribution of version. This surfaces in mid-level screens because small logic errors produce results that look correct at a glance.

> **Trick to Solving**
>
> Share/ratio calculations divide a group total by a grand total. Use a window function for the denominator.
> 
> 1. Aggregate to the group level
> 2. Use `SUM(metric) OVER ()` to get the grand total in each row
> 3. Divide group total by grand total and multiply by 100

---

### Break down the requirements

#### Step 1: Aggregate with COUNT

Group by the output grain and apply `COUNT()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 2: Order and limit the output

Sort by the target metric and apply `LIMIT` to return the requested number of rows. Ensure the sort is deterministic to produce reproducible results.

---

### The solution

**Cross-year month aggregation for minimum**

```sql
SELECT CAST(STRFTIME('%m', deploy_at) AS INTEGER) AS month_num,
    COUNT(*) AS deploy_count
FROM deploy_logs
GROUP BY CAST(STRFTIME('%m', deploy_at) AS INTEGER)
ORDER BY deploy_count ASC
LIMIT 1
```

> **Cost Analysis**
>
> The query scans 800K rows from `deploy_logs`. CTEs in most engines are optimization fences. For production workloads, consider inlining or materializing the intermediate results.

> **Interviewers Watch For**
>
> Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax. Breaking complex logic into named CTEs shows the interviewer you prioritize readability and debuggability.

> **Common Pitfall**
>
> Comparing dates stored as TEXT without casting can produce lexicographic instead of chronological ordering. Always confirm the column type.

---

## Common follow-up questions

- What happens to your result if deploy_logs.dur_secs contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on dur_secs.)_
- How would you verify that your aggregation on deploy_logs.log_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in deploy_logs.log_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like log_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/month_with_fewest_deploys)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.