# Data Repo Fix Commits

> How many commits start with 'fix'?

Canonical URL: <https://datadriven.io/problems/data_repo_fix_commits>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The data engineering manager is auditing how many bug-fix commits are going into data-related repos. Count commits that mention 'fix' in repos whose name contains 'data' (case-insensitive), excluding the analytics repo. Break down by repo and author, from most to fewest.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests string filtering with LIKE, case-insensitive matching, exclusion conditions, and multi-column GROUP BY. Combining these in a single query probes attention to detail.

---

### Break down the requirements

#### Step 1: Filter data repos excluding analytics

Use `LOWER(repo_name) LIKE '%data%'` for case-insensitive match, then exclude the analytics repo.

#### Step 2: Filter fix commits

`LOWER(message) LIKE '%fix%'` matches commit messages mentioning fix.

#### Step 3: Group and sort

`GROUP BY repo_name, author` with `COUNT(*)`, ordered descending.

---

### The solution

**Multi-filter aggregation with exclusion**

```sql
SELECT repo_name, author, COUNT(*) AS fix_count
FROM repo_commits
WHERE LOWER(repo_name) LIKE '%data%'
  AND LOWER(repo_name) NOT LIKE '%analytics%'
  AND LOWER(message) LIKE '%fix%'
GROUP BY repo_name, author
ORDER BY fix_count DESC
```

> **Cost Analysis**
>
> Scan of 4M rows with multiple string comparisons per row. LOWER() prevents index usage unless a functional index exists. The GROUP BY reduces to (data repos x authors).

> **Interviewers Watch For**
>
> Whether the candidate handles case-insensitivity with LOWER() or ILIKE, and whether the analytics exclusion is correctly expressed as a separate condition.

> **Common Pitfall**
>
> Using `repo_name != 'analytics'` without case normalization would miss 'Analytics' or 'ANALYTICS'. Always normalize case when the prompt says case-insensitive.

---

## Common follow-up questions

- What if the analytics repo was named 'data-analytics'? _(It matches both '%data%' and '%analytics%'. The exclusion correctly removes it.)_
- How would you also show the percentage of fix commits per repo? _(Divide fix_count by total commits per repo using a window function.)_
- What if 'fix' could appear in different forms like 'fixed', 'fixing', 'hotfix'? _(The LIKE '%fix%' pattern catches all of these. Tests substring matching awareness.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/data_repo_fix_commits)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.