# Above Average Product Prices

> Some products cost more than they should.

Canonical URL: <https://datadriven.io/problems/above_average_product_prices>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The finance team defines a product's base price as the lowest transaction amount ever recorded for it. They want to flag products whose base price runs above the average base price across all products. Return the product ID and its base price.

## Worked solution and explanation

### Why this problem exists in real interviews

A derived-threshold filter: compute a per-group aggregate (MIN total_amount per product_id), then keep groups whose aggregate beats the average of that same aggregate across all products. The interviewer is watching whether you name 'average of what' before you write.

---

### Break down the requirements

#### Step 1: Define base price

Per product_id, base_price = MIN(total_amount) over the transactions table. One row per product after the GROUP BY.

#### Step 2: Define the threshold

Average of base prices across all products, not the average of every transaction. The scalar subquery is MIN per product first, then AVG over those mins.

#### Step 3: Filter with HAVING, not WHERE

MIN(total_amount) is an aggregate; the comparison to the scalar subquery belongs in HAVING. WHERE evaluates before aggregation and can't see MIN.

#### Step 4: Return product_id and base_price

SELECT product_id, MIN(total_amount) AS base_price. No ORDER BY required by the prompt; add one if asked.

---

### The solution

**MIN PER PRODUCT VS AVG OF MINS**

```sql
SELECT
  product_id,
  MIN(total_amount) AS base_price
FROM transactions
GROUP BY product_id
HAVING MIN(total_amount) > (
  SELECT AVG(min_price)
  FROM (
    SELECT MIN(total_amount) AS min_price
    FROM transactions
    GROUP BY product_id
  )
)
```

> **Cost Analysis**
>
> transactions is 50M rows. The outer query and the inner scalar each do one GROUP BY product_id scan; the optimizer cannot share them across the subquery boundary. Most engines run the inner once and cache the scalar. A CTE rewrite is cleaner but does not change cost.

> **Interviewers Watch For**
>
> Say out loud: 'average of per-product mins, not average of every transaction.' Two different numbers. Picking the wrong one is the silent failure here. State the choice before writing.

> **Common Pitfall**
>
> Using (SELECT AVG(total_amount) FROM transactions) as the threshold. That averages every row, weighting heavy-volume products. You want one number per product, then average those. Refunds (negative total_amount) also drag MIN below zero; ask whether to filter total_amount > 0 first.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- Rewrite this using a CTE instead of a nested subquery. _(Tests CTE fluency. WITH product_mins AS (...) then SELECT FROM product_mins WHERE min_price > (SELECT AVG(min_price) FROM product_mins). Same cost, more readable.)_
- How does the answer change if refunds are stored as negative total_amount? _(Probes whether you noticed MIN will pick up the refund. Filter WHERE total_amount > 0 inside both the outer and inner aggregation, or define base_price on positive rows only.)_
- What if base price should be the lowest unit price, total_amount / quantity? _(Forces a per-row expression inside MIN. Watch for quantity = 0 division and NULL propagation; wrap with NULLIF(quantity, 0).)_
- How would you scale this if transactions had a billion rows per month? _(Tests partition awareness. Push transaction_date predicates into both aggregations, and consider materializing product_mins as a daily table so the average is read from a small precomputed set.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/above_average_product_prices)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.