# The Above Average

> What counts as good depends on the company you keep.

Canonical URL: <https://datadriven.io/problems/above_category_avg>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The pricing team is flagging products that look overpriced relative to their own category. For every product whose listed price exceeds the average price within its category, return the product name, category, price, and that category's average price.

## Worked solution and explanation

### Why this problem exists in real interviews

Comparing each row to an aggregate over its group (each product to its category's average) is the classic correlated-subquery-or-window question. Every pricing team runs this exact query weekly: flag items priced above their peer group. Interviewers grade two things: do you recognize the pattern, and do you pick a sensible implementation (CTE, correlated subquery, or window function). The trap is writing WHERE price > AVG(price) at the wrong scope, which is a syntax error or semantically nonsense depending on the engine.

---

### Break down the requirements

#### Step 1: Compute the per-category average once

A CTE that groups products by category and outputs (category, avg_price) is the cleanest expression. Doing the aggregation in a separate step keeps the main query linear and makes the average inspectable. A window function (AVG(price) OVER (PARTITION BY category)) is equally valid and sometimes faster because it skips the self-join.

#### Step 2: Join each product to its category's average

INNER JOIN products to the category_avgs CTE on category. Every product has a category, so INNER and LEFT produce the same rows; INNER reads cleaner. The join yields one row per product, now augmented with avg_price.

#### Step 3: Filter to rows where price > avg_price

The WHERE clause applies the comparison. Because avg_price is per category (from the join), the comparison is per category automatically. No GROUP BY on the outer query, no HAVING.

#### Step 4: Project the four required columns

product_name, category, price, avg_price. The spec does not ask for ordering, so omit ORDER BY to avoid imposing one the grader does not check.

---

### The solution

**CTE for per-category average, then join and filter**

```sql
WITH category_avgs AS (
    SELECT category, AVG(price) AS avg_price
    FROM products
    GROUP BY category
)
SELECT
    p.product_name,
    p.category,
    p.price,
    ca.avg_price
FROM products p
JOIN category_avgs ca ON p.category = ca.category
WHERE p.price > ca.avg_price
```

> **Cost Analysis**
>
> The CTE scans products once and aggregates, O(n). The outer join is O(n + c) where c is the number of distinct categories, typically tiny. Total is O(n). A window-function version does a single pass with O(n) sort per category partition and often wins on modern optimizers because it avoids materializing the intermediate CTE.

> **Interviewers Watch For**
>
> Whether you recognize this as a group-aggregate-compare pattern, whether you can name two implementations (CTE+join vs window), and whether you pick one and defend it. Strong candidates note that correlated subqueries (WHERE price > (SELECT AVG(price) FROM products p2 WHERE p2.category = p.category)) work but can be O(n^2) on engines that do not optimize them.

> **Common Pitfall**
>
> Writing SELECT product_name, category, price, AVG(price) FROM products GROUP BY product_name, category, price. That averages at the wrong grain (per product, which is just the price) and returns every row unchanged. The aggregate must be scoped to category, not to the outer SELECT list. Another miss is using HAVING instead of WHERE after a GROUP BY; HAVING filters aggregates, but you are filtering individual rows, so WHERE is correct.

---

## Common follow-up questions

- Rewrite this with a window function instead of a CTE. _(SELECT product_name, category, price, avg_price FROM (SELECT *, AVG(price) OVER (PARTITION BY category) AS avg_price FROM products) x WHERE price > avg_price. Discuss why the subquery is needed (you cannot WHERE on a window result).)_
- What changes if you also need the percent above average (e.g., 'priced 23% above category mean')? _((price - avg_price) / avg_price * 100 AS pct_above; cast to DOUBLE if the engine does integer division.)_
- How would you flag products priced above the global average instead of the category average? _(a scalar subquery SELECT AVG(price) FROM products, or remove the PARTITION BY clause; discuss the business meaning of each.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/above_category_avg)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.