# Builds per Author per Branch

> Who triggered what, and where?

Canonical URL: <https://datadriven.io/problems/builds_per_author_per_branch>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The release manager wants to see build volume broken down by who triggered the build and which branch it ran on. Show each trigger-branch combination with its build count, ordered alphabetically by trigger and branch, then by count descending.

## Worked solution and explanation

### Why this problem exists in real interviews

Release-health questions probe whether you name the output grain before writing. One row per (trigger, branch) is long format, one row per trigger with branch columns is wide. The interviewer wants you to ask which the consumer wants, and to flag that `GROUP BY` will surface NULL trigger or NULL branch as their own buckets unless you filter them.

---

### Break down the requirements

#### Step 1: Name the grain out loud

One row per (`trigger`, `branch`) pair. Say it before writing. That fixes both `GROUP BY` keys and rules out a `DISTINCT` shortcut.

#### Step 2: Pick the aggregate

Build volume is `COUNT(*)` over `ci_builds`. Not `COUNT(build_id)`, not `COUNT(DISTINCT build_id)`. `build_id` is the PK, so they tie, but `COUNT(*)` reads as the intent.

#### Step 3: Alias trigger as author

Prompt says author, schema says `trigger`. Alias `trigger AS author` in the SELECT so the output column matches what the release manager asked for.

#### Step 4: Order by three keys

`ORDER BY trigger ASC, branch ASC, build_count DESC`. The count tiebreaker only fires when the first two are equal, which under this grain never happens, but write it anyway: the prompt asked for it.

---

### The solution

**BUILDS PER AUTHOR PER BRANCH**

```sql
SELECT trigger AS author, branch, COUNT(*) AS build_count
FROM ci_builds
GROUP BY trigger, branch
ORDER BY trigger ASC, branch ASC, build_count DESC
```

> **Cost Analysis**
>
> Full scan of 4M rows, hash aggregate keyed on (trigger, branch). Cardinality is authors x branches, low thousands at most, so the hash table fits in memory and the sort at the end is cheap. A composite index on (trigger, branch) would let the planner do an index-only stream aggregate.

> **Interviewers Watch For**
>
> Before writing, ask: do they want long format (one row per pair) or wide format (one row per author with a column per branch)? The prompt here is long. If they say wide, switch to conditional aggregation with `SUM(CASE WHEN branch = 'main' THEN 1 ELSE 0 END)` per branch.

> **Common Pitfall**
>
> `GROUP BY` treats `NULL` as its own bucket, so a `NULL` trigger or `NULL` branch shows up as a row in the output. If those are bot builds or detached HEAD, the release manager won't recognize them. Confirm whether to filter via `WHERE trigger IS NOT NULL AND branch IS NOT NULL` before writing.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- Reshape the output so each author is one row with a column per branch. _(Tests conditional aggregation with `SUM(CASE WHEN branch = ... THEN 1 END)` and naming the pivot keys statically.)_
- Only show (author, branch) pairs with at least 50 builds. _(Tests `HAVING` versus `WHERE` and that aggregate filters belong after the `GROUP BY`.)_
- Add the failure rate per (author, branch). _(Tests conditional aggregation: `AVG(CASE WHEN status = 'failed' THEN 1.0 ELSE 0 END)` in the same grouped projection.)_
- Restrict to the last 30 days of builds. _(Tests pushing a `WHERE built_at >= ...` predicate before the aggregate so partition pruning fires on `built_at`.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/builds_per_author_per_branch)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.