# Shipped to Prod

> Staging is safe. Production is real. How many made the jump?

Canonical URL: <https://datadriven.io/problems/production_deployment_count>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The release manager needs a quick count of how many deployments have reached the production environment.

## Worked solution and explanation

### Why this problem exists in real interviews

The core skill being tested is filtering and projection, applied to the `deploy_logs` table in a deployment context. Getting the `env_name` column right is where most candidates slip.

---

### Break down the requirements

#### Step 1: Filter to production environment

`WHERE env_name = 'production'` restricts to production deploys.

#### Step 2: Count the rows

`SELECT COUNT(*)` returns the total production deployment count.

---

### The solution

**Filter to production environment to find production deployment count**

```sql
SELECT COUNT(*) AS deploy_count
FROM deploy_logs
WHERE env_name = 'production'
```

> **Cost Analysis**
>
> The query scans `deploy_logs` (400,000 rows). A covering index on the filter and group columns would reduce I/O. At this scale, the full scan is acceptable but becomes costly if the table grows 10x.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- The `dur_secs` column in `deploy_logs` has a 1% null rate. How does your query handle rows where `dur_secs` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `deploy_logs.dur_secs` and understands how aggregates skip NULL values.)_
- With 400,000 distinct values in `deploy_logs.log_id`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `log_id` affects grouping and sort operations.)_
- If this query ran as a scheduled job, how would you add monitoring to detect when the result set is suspiciously empty? _(Tests operational awareness around scheduled query jobs.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/production_deployment_count)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.