# Production Deploys From April Onward

> After the cutoff, how many times did prod get a push?

Canonical URL: <https://datadriven.io/problems/production_deploys_from_april_onward>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

How many deployments went to the production environment during April or later (month 4 onward), regardless of year? Include the maximum deployment duration among those deploys.

## Worked solution and explanation

### Why this problem exists in real interviews

This problem targets date extraction for time bucketing across the `deploy_logs` table. You need to work with columns like `env_name`, `deploy_at`, and `dur_secs` to satisfy the requirements.

---

### Break down the requirements

#### Step 1: Filter to production and April onward

`WHERE env_name = 'production' AND CAST(strftime('%m', deploy_at) AS INTEGER) >= 4` applies both conditions.

#### Step 2: Compute count and max duration

`SELECT COUNT(*), MAX(dur_secs)` returns both requested metrics.

---

### The solution

**Filter to production and april onward to find production deploys fr...**

```sql
SELECT COUNT(*) AS deploy_count, MAX(dur_secs) AS max_duration
FROM deploy_logs
WHERE env_name = 'production'
  AND CAST(strftime('%m', deploy_at) AS INTEGER) >= 4
```

> **Cost Analysis**
>
> The query scans `deploy_logs` (600,000 rows). A covering index on the filter and group columns would reduce I/O. At this scale, the full scan is acceptable but becomes costly if the table grows 10x.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If `dur_secs` in `deploy_logs` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `deploy_logs.dur_secs`.)_
- `deploy_logs.log_id` has roughly 600,000 distinct values. What index strategy would you use to avoid a full scan on `deploy_logs`? _(Tests indexing knowledge specific to the high-cardinality `log_id` column in `deploy_logs`.)_
- If the date column in `deploy_logs` spans multiple years, does your date extraction logic still produce correct time buckets? _(Tests whether the candidate accounts for year boundaries in date bucketing.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/production_deploys_from_april_onward)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.