# Super Reviewers

> The most prolific code reviewers.

Canonical URL: <https://datadriven.io/problems/super_reviewers>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The engineering manager is identifying super reviewers who carry a disproportionate review load. Find reviewers who have reviewed at least 7 pull requests, including cases where the author and reviewer are the same person.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests whether a candidate can demonstrate filtering groups after aggregation, and handling null values correctly. This pattern appears frequently in mid-level SQL rounds where interviewers want to see structured thinking.

---

### Break down the requirements

#### Step 1: Filter out null values

Exclude rows where `reviewer` is NULL. This prevents nulls from polluting aggregations or creating phantom groups.

#### Step 2: Aggregate by `reviewer`

`GROUP BY reviewer` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 3: Filter groups with HAVING

HAVING applies after GROUP BY, filtering out groups that do not meet the threshold. This cannot be done in WHERE because the aggregate has not been computed yet.

#### Step 4: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Group code_reviews by reviewer with HAVING count threshold**

```sql
SELECT reviewer, COUNT(*) AS review_count
FROM code_reviews
WHERE reviewer IS NOT NULL
GROUP BY reviewer
HAVING COUNT(*) >= 7
ORDER BY review_count DESC
```

> **Cost Analysis**
>
> With ~400K rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you use HAVING (not WHERE) to filter after aggregation; how you handle NULL values and whether you account for them in filters and aggregations.

> **Common Pitfall**
>
> Putting the aggregate condition in WHERE instead of HAVING causes a syntax error. WHERE runs before GROUP BY; HAVING runs after.

---

## Common follow-up questions

- If the prompt later asked you to exclude self-reviews, how would the HAVING count change for borderline reviewers? _(Tests ability to add a WHERE predicate without confusing it with HAVING logic.)_
- How would you also show each super reviewer's average number of comments per review? _(Tests adding a second aggregate (AVG(comments)) to the same grouped query.)_
- If reviewer is NULL for some rows (automated reviews), how does GROUP BY handle that? _(Tests knowledge that GROUP BY treats NULLs as a single group, which may need explicit exclusion.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/super_reviewers)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.