# Commit Royalty

> In a sea of commits, only a few wear the crown.

Canonical URL: <https://datadriven.io/problems/commit_royalty>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Rank commits by lines added each month in 2025, then count how many times each author placed in the top 10. Show the author with the highest count.

## Worked solution and explanation

### Why this problem exists in real interviews

Engineering productivity teams ask 'who consistently lands big changes' rather than 'who has the single biggest commit'. This prompt encodes that question as 'top 10 by lines added per month'. Interviewers want to see whether you partition windows by a derived expression (year-month), use DENSE_RANK to honor ties, and produce one final answer with a deterministic tie break.

---

### Break down the requirements

#### Step 1: Filter to last year only

WHERE strftime('%Y', commit_at) = '2026' (the 2025 placeholder resolves to the previous year). Filtering before windowing keeps the partition counts and rank computation small. strftime returns text, so compare against a quoted year.

#### Step 2: Rank within each year-month with DENSE_RANK

DENSE_RANK() OVER (PARTITION BY strftime('%Y-%m', commit_at) ORDER BY added DESC) gives a no gaps rank inside each month. DENSE_RANK is required because the prompt says 'rank 10 or less'; ties at the boundary should all qualify, and RANK with gaps would silently exclude them.

#### Step 3: Count and pick the winner

Filter rnk <= 10, GROUP BY author, COUNT(*). Order by COUNT DESC then author ASC for the tie break, LIMIT 1. The author column is selected directly from the CTE because it survives PARTITION BY.

---

### The solution

**Window rank inside each month, then count per author**

```sql
WITH monthly_ranked AS (
  SELECT author, added,
         DENSE_RANK() OVER (PARTITION BY strftime('%Y-%m', commit_at) ORDER BY added DESC) AS rnk
  FROM repo_commits
  WHERE strftime('%Y', commit_at) = '2026'
)
SELECT author, COUNT(*) AS top10_count
FROM monthly_ranked
WHERE rnk <= 10
GROUP BY author
ORDER BY top10_count DESC, author ASC
LIMIT 1
```

> **Cost Analysis**
>
> Filtering by year first reduces 4M rows to a single year of data. The window function then sorts each of 12 monthly partitions by added DESC, which is cheap because each partition is small. Final aggregation is over only the rows that pass rnk <= 10, at most 12 times 10 equals 120 rows before the COUNT.

> **Interviewers Watch For**
>
> Whether you partitioned the window on the derived year-month rather than on the full timestamp (which would partition by second and make every commit rank 1), and whether DENSE_RANK was used so that ties at the rank 10 boundary all qualify. Strong candidates also note that strftime returns TEXT, so partitioning on it works as expected for chronological grouping.

> **Common Pitfall**
>
> Using ROW_NUMBER instead of DENSE_RANK arbitrarily breaks ties on 'added', so commits with identical line counts get different ranks. With a heavy tie at rank 10, ROW_NUMBER would include 10 rows total and DENSE_RANK could include 12 or more, which is what the prompt wants.

---

## Common follow-up questions

- Why partition the window by year-month rather than just by month? _(Tests whether the candidate considered multi year data. Partitioning by month alone would lump January 2025 with January 2026; the prompt is single year so it does not bite, but it is the right instinct.)_
- How would you adapt this to find authors who appear in the top 10 in at least 6 different months? _(Tests whether the candidate can add a HAVING COUNT(DISTINCT month) >= 6 clause after the GROUP BY. This is a common follow up because it filters out the one month wonder authors.)_
- What does the answer become if 'added' includes auto generated lockfile changes? _(Tests data quality awareness. The candidate should suggest filtering by file path or message keywords before ranking, because lockfile commits routinely add tens of thousands of lines.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/commit_royalty)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.