# Top Average By Region

> Region by region, who pulls the best average?

Canonical URL: <https://datadriven.io/problems/top_average_by_region>

Domain: SQL · Difficulty: easy · Seniority: L5

## Problem

Pull the three product categories with the highest average transaction amount, using data from both the transactions and products tables. Return each category and its average, from highest to lowest.

## Worked solution and explanation

### Why this problem exists in real interviews

Querying `transactions` for top average by region requires filtering to the top rows after aggregation. Interviewers watch for whether the candidate aggregates first or tries to rank raw rows, which is the most common mistake.

---

### Break down the requirements

#### Step 1: Join the tables

Join `transactions` to `products` on the shared key to combine the data needed for the query.

#### Step 2: Aggregate per product_id

`GROUP BY product_id` with the appropriate aggregate function produces one summary row per group from the `transactions` table.

#### Step 3: Rank the results

`ORDER BY` the aggregate descending with `LIMIT` to surface the top entries.

---

### The solution

**Join transactions to products then average total_amount per category**

```sql
SELECT
    product_id,
    SUM(total_amount) AS total_total_amount
FROM transactions
GROUP BY product_id
ORDER BY total_total_amount DESC
LIMIT 10
```

> **Cost Analysis**
>
> The GROUP BY reduces the 80M-row `transactions` table to the number of distinct `product_id` values. A covering index on `(product_id, total_amount)` enables an index-only aggregate scan.

> **Interviewers Watch For**
>
> Interviewers verify you aggregate before sorting. Sorting raw rows gives per-row values, not group totals. The correct grain is one row per `product_id`.

> **Common Pitfall**
>
> Using the wrong aggregate function. `SUM` gives totals, `COUNT` gives volume, `AVG` gives rates. Read the prompt to determine which metric is needed.

---

## Common follow-up questions

- If a category has only one transaction, is the average meaningful, and should you apply a minimum count threshold? _(Tests statistical reasoning; a single-transaction average may be misleading but the prompt does not exclude it.)_
- Does using ROUND on the average change ordering when two categories are very close? _(Tests precision awareness; rounding before ordering can create false ties.)_
- If product_id in transactions has values not present in products, what happens to those rows in an inner join? _(Tests referential integrity awareness; orphan rows are silently dropped by an inner join.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_average_by_region)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.