# Clean Averages

> Merchandising only cares about the categories customers actually rate.

Canonical URL: <https://datadriven.io/problems/clean_averages>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The merchandising team is benchmarking product quality across the catalog. For each category, they need the average customer rating rounded to one decimal place, but only for products that have actually received a rating. Leave out any category with fewer than three rated products. List the results from highest average rating to lowest.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests NULL filtering within aggregation and HAVING. Interviewers check whether you can exclude NULLs before averaging and apply a count threshold to ensure statistical significance.

---

### Break down the requirements

#### Step 1: Filter out unrated products

`WHERE rating IS NOT NULL` removes products without a rating before aggregation.

#### Step 2: Aggregate per category

`ROUND(AVG(rating), 1)` computes the average rounded to one decimal. `COUNT(*)` tracks rated products.

#### Step 3: Filter and sort

`HAVING COUNT(*) >= 3` ensures a minimum sample size. `ORDER BY avg_rating DESC` ranks best-rated first.

---

### The solution

**Null-filtered average with minimum sample size**

```sql
SELECT category, ROUND(AVG(rating)::NUMERIC, 1) AS avg_rating
FROM products
WHERE rating IS NOT NULL
GROUP BY category
HAVING COUNT(*) >= 3
ORDER BY avg_rating DESC
```

> **Cost Analysis**
>
> Single scan of 6K rows. Trivially fast. The WHERE filter removes NULL rows before the GROUP BY, reducing the aggregation input slightly.

> **Common Pitfall**
>
> AVG() already ignores NULLs, so the WHERE clause is technically redundant for the average. However, it affects COUNT(*) vs COUNT(rating). Without the WHERE, you would need `HAVING COUNT(rating) >= 3` instead of `HAVING COUNT(*) >= 3`.

---

## Common follow-up questions

- Does AVG() skip NULLs automatically? _(Yes. Tests awareness that aggregate functions (except COUNT(*)) ignore NULLs.)_
- What is the difference between COUNT(*) and COUNT(rating)? _(COUNT(*) counts all rows; COUNT(column) counts non-NULL values.)_
- What if you needed the median rating instead of the average? _(Tests PERCENTILE_CONT or nth_value approaches for median computation.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/clean_averages)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.