# Commit Cadence

> Some repos go quiet for too long.

Canonical URL: <https://datadriven.io/problems/commit_cadence>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

The engineering manager wants to flag repos with infrequent commit activity. For each repository, compute the average gap in days between consecutive commits. Only include repos whose average gap exceeds 5 days, and rank them by how inactive they are, widest average gap first. Return the repo name, average gap, and inactivity rank.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests LAG for computing gaps between consecutive events, aggregation of those gaps, HAVING for threshold filtering, and RANK for ordering. It is a multi-step problem combining window functions with standard aggregation.

---

### Break down the requirements

#### Step 1: Compute gaps between consecutive commits

Use `LAG(commit_at) OVER (PARTITION BY repo_name ORDER BY commit_at)` to get the previous commit timestamp, then compute the difference in days.

#### Step 2: Average the gaps per repo

`GROUP BY repo_name` with `AVG(gap_days)`, excluding the first commit per repo (which has a NULL gap).

#### Step 3: Filter and rank

`HAVING AVG(gap_days) > 5` keeps inactive repos. `RANK()` by average gap descending assigns the inactivity rank.

---

### The solution

**LAG-based gap analysis with ranking**

```sql
WITH gaps AS (
    SELECT
        repo_name,
        commit_at - LAG(commit_at) OVER (PARTITION BY repo_name ORDER BY commit_at) AS gap_days
    FROM repo_commits
)
SELECT
    repo_name,
    AVG(gap_days) AS avg_gap,
    RANK() OVER (ORDER BY AVG(gap_days) DESC) AS inactivity_rank
FROM gaps
WHERE gap_days IS NOT NULL
GROUP BY repo_name
HAVING AVG(gap_days) > INTERVAL '5 days'
ORDER BY avg_gap DESC
```

> **Cost Analysis**
>
> Scan of 5M rows with a window function (sort per partition). The LAG step is O(n log n) for sorting. The subsequent GROUP BY reduces to the number of repos. The window sort dominates.

> **Interviewers Watch For**
>
> Whether you correctly exclude the first commit per repo (NULL gap) before averaging. Including NULLs in AVG silently drops them, but the WHERE clause makes intent explicit.

> **Common Pitfall**
>
> Computing the gap as `commit_at - LAG(commit_at)` returns an interval in PostgreSQL, not a number of days. You may need `EXTRACT(EPOCH FROM gap) / 86400` to get numeric days depending on the database.

---

## Common follow-up questions

- What if a repo has only one commit? _(All gaps are NULL, so it is excluded entirely. Tests whether the candidate handles this edge case.)_
- How would you find repos with accelerating commit frequency? _(Compare recent gaps to older gaps, testing trend analysis within partitions.)_
- What if commit_at is a date (not timestamp)? _(Date subtraction returns an integer in PostgreSQL, simplifying the gap calculation.)_
- How would you handle repos with commits on the same timestamp? _(Tests zero-gap handling and whether ORDER BY needs a tiebreaker.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/commit_cadence)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.