# Reviewer Performance Metrics

> Some reviewers are thorough. Others are fast.

Canonical URL: <https://datadriven.io/problems/reviewer_performance_metrics>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For each reviewer who has reviewed at least one merged pull request (a PR with a non-null merged date), show the total number of unique repos reviewed, total repos where the PR was merged, and the latest review date.

## Worked solution and explanation

### Why this problem exists in real interviews

This problem targets conditional aggregation via CASE across the `code_reviews` table. You need to work with columns like `repo_name`, `reviewer`, and `opened_at` to satisfy the requirements. Beyond that, the query demands HAVING for post-aggregation filtering.

> **Trick to Solving**
>
> When the prompt asks for multiple metrics split by a condition (e.g., resolved vs. unresolved), conditional aggregation avoids multiple passes.
> 
> 1. Spot the split: two or more categories in one output row
> 2. Use `SUM(CASE WHEN condition THEN 1 ELSE 0 END)` for each bucket
> 3. Group by the common dimension

---

### Break down the requirements

#### Step 1: Use conditional aggregation with CASE

A `CASE` expression inside the aggregate function splits rows into buckets without multiple passes over the data. Each condition maps to one output column.

#### Step 2: Deduplicate the result with DISTINCT

`SELECT DISTINCT` removes duplicate rows from the output. This is necessary when joins or subqueries can produce repeated combinations.

---

### The solution

**Conditional filter-after-group for reviewer performance metrics**

```sql
SELECT reviewer, COUNT(DISTINCT repo_name) AS reviewed_count,
    COUNT(DISTINCT CASE WHEN merged IS NOT NULL THEN repo_name END) AS merged_count,
    MAX(opened_at) AS latest_review_date
FROM code_reviews
WHERE reviewer IS NOT NULL
GROUP BY reviewer
HAVING SUM(CASE WHEN merged IS NOT NULL THEN 1 ELSE 0 END) >= 1
ORDER BY reviewer
```

> **Cost Analysis**
>
> With ~400K rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you use HAVING (not WHERE) to filter after aggregation; how you handle NULL values and whether you account for them in filters and aggregations; whether you can pivot data with conditional aggregation in a single pass instead of multiple queries.

> **Common Pitfall**
>
> Putting the aggregate condition in WHERE instead of HAVING causes a syntax error. WHERE runs before GROUP BY; HAVING runs after.

---

## Common follow-up questions

- The `merged` column in `code_reviews` has a 10% null rate. How does your query handle rows where `merged` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `code_reviews.merged` and understands how aggregates skip NULL values.)_
- If `code_reviews` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `review_id`? _(Tests ability to identify performance hotspots related to `code_reviews.review_id` at scale.)_
- Your conditional CASE logic assumes the categories are exhaustive. What happens if a row in `code_reviews` falls into none of the branches? _(Tests awareness of the implicit ELSE NULL in CASE expressions.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/reviewer_performance_metrics)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.