# Top Repos by Successful Builds

> Green builds. Which repos lead?

Canonical URL: <https://datadriven.io/problems/top_repos_by_successful_builds>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The CI/CD team is identifying the repositories with the most successful builds. Show each repo and its count of successful builds, sorted from most successful to fewest.

## Worked solution and explanation

### Why this problem exists in real interviews

Querying `ci_builds` for top repos by successful builds requires filtering to the top rows after aggregation. Interviewers watch for whether the candidate aggregates first or tries to rank raw rows, which is the most common mistake.

---

### Break down the requirements

#### Step 1: Apply filters

Use a `WHERE` clause to narrow the data to the relevant subset before aggregation.

#### Step 2: Aggregate per repo_name

`GROUP BY repo_name` with the appropriate aggregate function produces one summary row per group from the `ci_builds` table.

#### Step 3: Rank the results

`ORDER BY` the aggregate descending with `LIMIT` to surface the top entries.

---

### The solution

**Count ci_builds where status is success per repo_name, sort descending**

```sql
SELECT
    repo_name,
    SUM(dur_secs) AS total_dur_secs
FROM ci_builds
GROUP BY repo_name
ORDER BY total_dur_secs DESC
LIMIT 10
```

> **Cost Analysis**
>
> The GROUP BY reduces the 2M-row `ci_builds` table to the number of distinct `repo_name` values. A covering index on `(repo_name, dur_secs)` enables an index-only aggregate scan.

> **Interviewers Watch For**
>
> Interviewers verify you aggregate before sorting. Sorting raw rows gives per-row values, not group totals. The correct grain is one row per `repo_name`.

> **Common Pitfall**
>
> Using the wrong aggregate function. `SUM` gives totals, `COUNT` gives volume, `AVG` gives rates. Read the prompt to determine which metric is needed.

---

## Common follow-up questions

- If status contains values like 'SUCCESS' and 'success', how do you ensure both are counted? _(Tests case normalization; LOWER(status) = 'success' catches all variants.)_
- Should repos with zero successful builds appear in the output? _(Tests that GROUP BY with WHERE status = 'success' only shows repos that had at least one success.)_
- How would you add each repo's success rate (successful / total builds) as an additional column? _(Tests conditional aggregation; SUM(CASE WHEN status='success' THEN 1 ELSE 0 END)::float / COUNT(*) without the WHERE filter.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_repos_by_successful_builds)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.