# Above Average

> Products beating the catalog average.

Canonical URL: <https://datadriven.io/problems/above_average>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The pricing team is identifying premium products that stand out from the rest of the catalog. Show the name, category, and price of every product whose price exceeds the overall catalog average, and include a column with that catalog-wide average for reference. Present results from most expensive to least.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes **scalar subquery filtering**, combined with **aggregation**. Getting it right quickly signals comfort with the building blocks that complex queries depend on.

> **Trick to Solving**
>
> The phrase "above average" or "exceeds the overall" signals a **scalar subquery**. The average must be computed across all qualifying rows, then each row compared against that single value.
> 
> 1. Write the subquery to compute the benchmark first
> 2. Use it in the WHERE clause of the outer query
> 3. Verify the subquery is uncorrelated (runs once, not per row)

---

### Break down the requirements

#### Step 1: Compute the benchmark average of `price`

A scalar subquery calculates `AVG(price)` across all rows. This single value becomes the comparison threshold.

#### Step 2: Filter rows exceeding the benchmark

`WHERE price > (subquery)` keeps only rows above the global average. The subquery executes once.

#### Step 3: Order by the comparison column

Sort descending by `price` to show the most extreme outliers first.

---

### The solution

**Scalar subquery with average reference column**

```sql
SELECT category, price,
    (SELECT AVG(price) FROM products) AS overall_avg
FROM products
WHERE price > (
    SELECT AVG(price)
    FROM products
)
ORDER BY price DESC
```

> **Cost Analysis**
>
> The main table has 6K rows. The GROUP BY reduces the row count early, keeping downstream operations cheap.

> **Interviewers Watch For**
>
> Strong candidates state the correct `GROUP BY` grain before writing any SQL, showing they think about the output shape first. Candidates who explain correlated vs uncorrelated subqueries earn extra credibility.

> **Common Pitfall**
>
> Selecting a non-aggregated column without including it in `GROUP BY` is the most common error. Some engines reject it; others silently return arbitrary values.

---

## Common follow-up questions

- What happens if the table is empty? _(Tests awareness of edge cases: COUNT returns 0, but AVG/MIN/MAX return NULL on empty input.)_
- How would you verify the output is correct with a quick spot check? _(Tests whether the candidate can validate results against known data or row counts.)_
- What index would speed this query up the most? _(Tests basic indexing intuition: filter and join columns are the top candidates.)_
- How would the results change if you used COUNT(DISTINCT col) instead of COUNT(*)? _(Tests understanding of distinct vs total counting and when deduplication matters.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/above_average)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.