# Category Revenue

> Which categories pull their weight?

Canonical URL: <https://datadriven.io/problems/category_revenue>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The finance team is building a category performance dashboard for the quarterly business review. For each product category, they need the total revenue generated, the number of transactions that drove it, and the average transaction size. Only include categories that have crossed a meaningful revenue threshold ,  anything below 500 in total revenue isn't worth a line on the slide. Rank from highest-revenue category to lowest.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests multi-aggregate GROUP BY with HAVING on a derived metric. Interviewers probe whether you can compute total, count, and average in a single pass and then apply a threshold filter using HAVING.

---

### Break down the requirements

#### Step 1: Join products to transactions

Join on `product_id` to associate `category` with each transaction.

#### Step 2: Aggregate three metrics

`SUM(t.total_amount)` for revenue, `COUNT(*)` for transaction count, and `AVG(t.total_amount)` for average transaction size, all grouped by `p.category`.

#### Step 3: Filter and sort

`HAVING SUM(t.total_amount) >= 500` removes low-revenue categories. `ORDER BY SUM(t.total_amount) DESC` ranks highest first.

---

### The solution

**Multi-metric aggregation with HAVING threshold**

```sql
SELECT
    p.category,
    SUM(t.total_amount) AS total_revenue,
    COUNT(*) AS transaction_count,
    AVG(t.total_amount) AS avg_transaction_size
FROM products p
JOIN transactions t ON p.product_id = t.product_id
GROUP BY p.category
HAVING SUM(t.total_amount) >= 500
ORDER BY total_revenue DESC
```

> **Cost Analysis**
>
> Hash join of 8K products to 50M transactions. The three aggregates add minimal overhead since they operate in the same GROUP BY pass. HAVING filters after aggregation, so all 50M rows are still scanned.

> **Common Pitfall**
>
> Computing `AVG` by dividing `SUM/COUNT` manually can introduce rounding differences compared to the built-in `AVG()` function. Use the native aggregate for consistency.

---

## Common follow-up questions

- What if the revenue threshold was the median category revenue? _(Tests using a CTE or subquery to compute the median and reference it in HAVING.)_
- How would you handle NULLs in total_amount? _(SUM and AVG skip NULLs, but COUNT(*) does not; tests awareness of NULL behavior in aggregates.)_
- What if you needed to add a percentage-of-total column? _(Tests window functions: SUM(total_amount) OVER () for the grand total.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/category_revenue)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.