# Fastest CI Build Date

> The fastest build ever. When did it happen?

Canonical URL: <https://datadriven.io/problems/fastest_ci_build_date>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The CI/CD team is setting a performance baseline and wants to know what the fastest build ever achieved looks like. Surface the build date and duration of the shortest CI build on record.

## Worked solution and explanation

### Why this problem exists in real interviews

The core skill being tested is grouped aggregation over `ci_builds`. Candidates must decide how `repo_name`, `branch`, `status` interact before choosing a join strategy or aggregation level.

---

### Break down the requirements

#### Step 1: Group by `repo_name`

`GROUP BY repo_name` produces one row per distinct value.

#### Step 2: Compute the maximum

`MAX(dur_secs)` calculates the requested metric per group.

---

### The solution

**Group-aggregate for fastest ci build date**

```sql
SELECT repo_name, MAX(dur_secs) AS result
FROM ci_builds
GROUP BY repo_name
ORDER BY result DESC
```

> **Cost Analysis**
>
> Single-pass hash aggregate. An index on `repo_name` helps if the table is large.

> **Interviewers Watch For**
>
> The interviewer checks that you use the correct aggregate function (MAX) based on the prompt.

> **Common Pitfall**
>
> Selecting a column not in GROUP BY and not in an aggregate is a syntax error in strict SQL mode.

---

## Common follow-up questions

- The `dur_secs` column in `ci_builds` has roughly 2% NULLs. How does your query handle those rows, and would the result change if NULLs were replaced with zeros? _(Tests whether the candidate understands how NULLs propagate through aggregation functions and whether their WHERE/JOIN conditions implicitly filter them out.)_
- Your GROUP BY aggregates `build_id` from `ci_builds`. If two groups have the same aggregate value, how is the output ordered, and is that deterministic? _(Tests awareness that ORDER BY on a non-unique value produces non-deterministic row order without a tiebreaker.)_
- `build_id` in `ci_builds` has ~3M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- If the business definition of `branch` changed mid-quarter (e.g., a status value was renamed), how would you handle historical consistency? _(Tests awareness of slowly changing dimensions and backward-compatible query design.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/fastest_ci_build_date)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.