# Reviews Per Reviewer

> The workload split across reviewers.

Canonical URL: <https://datadriven.io/problems/reviews_per_reviewer>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The engineering manager is balancing review workloads and needs to see how many code reviews each reviewer is currently carrying.

## Worked solution and explanation

### Why this problem exists in real interviews

This problem targets grouped COUNT aggregation across the `code_reviews` table. You need to work with the `reviewer` column to satisfy the requirements.

---

### Break down the requirements

#### Step 1: Aggregate by `reviewer`

`GROUP BY reviewer` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 2: Select the target columns

The SELECT clause picks exactly the columns the prompt asks for. Returning extra columns or missing a required alias would fail the grading check.

---

### The solution

**Aggregate by `reviewer` to find reviews per reviewer**

```sql
SELECT reviewer, COUNT(*) AS review_count
FROM code_reviews
GROUP BY reviewer
```

> **Cost Analysis**
>
> With ~300K rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- The `merged` column in `code_reviews` has a 10% null rate. How does your query handle rows where `merged` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `code_reviews.merged` and understands how aggregates skip NULL values.)_
- `code_reviews.review_id` has roughly 300,000 distinct values. What index strategy would you use to avoid a full scan on `code_reviews`? _(Tests indexing knowledge specific to the high-cardinality `review_id` column in `code_reviews`.)_
- If this query ran as a scheduled job, how would you add monitoring to detect when the result set is suspiciously empty? _(Tests operational awareness around scheduled query jobs.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/reviews_per_reviewer)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.