# Q by Q

> Thirteen weeks. This is how the team spent them.

Canonical URL: <https://datadriven.io/problems/quarterly_deployment_count>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The release engineering team is presenting deployment velocity trends at the all-hands. Show each quarter in YYYY-QN format alongside its deployment count, sorted from most deployments to fewest.

## Worked solution and explanation

### Why this problem exists in real interviews

This problem targets date extraction for time bucketing across the `deploy_logs` table. You need to work with the `deploy_at` column to satisfy the requirements. Beyond that, the query demands string concatenation for formatted output.

---

### Break down the requirements

#### Step 1: Construct quarter label

Use `strftime('%Y', deploy_at) || '-Q' || ((CAST(strftime('%m', deploy_at) AS INTEGER) - 1) / 3 + 1)` to produce YYYY-QN format.

#### Step 2: Group and count

`GROUP BY quarter_label` with `COUNT(*)` tallies deployments per quarter.

#### Step 3: Order by count descending

`ORDER BY COUNT(*) DESC` surfaces the busiest quarters first.

---

### The solution

**Construct quarter label to find quarterly deployment count**

```sql
SELECT
    strftime('%Y', deploy_at) || '-Q' || ((CAST(strftime('%m', deploy_at) AS INTEGER) - 1) / 3 + 1) AS quarter,
    COUNT(*) AS deploy_count
FROM deploy_logs
GROUP BY quarter
ORDER BY deploy_count DESC
```

> **Cost Analysis**
>
> The query scans `deploy_logs` (500,000 rows). A covering index on the filter and group columns would reduce I/O. At this scale, the full scan is acceptable but becomes costly if the table grows 10x.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If `dur_secs` in `deploy_logs` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `deploy_logs.dur_secs`.)_
- If `deploy_logs` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `log_id`? _(Tests ability to identify performance hotspots related to `deploy_logs.log_id` at scale.)_
- If the date column in `deploy_logs` spans multiple years, does your date extraction logic still produce correct time buckets? _(Tests whether the candidate accounts for year boundaries in date bucketing.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/quarterly_deployment_count)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.