# Revenue by Product

> Which products carry the revenue line?

Canonical URL: <https://datadriven.io/problems/revenue_by_product>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The general manager is preparing the annual product performance review. For each product, show the product name, total revenue generated, and the number of units sold. Only include products that have generated at least one hundred dollars in total revenue, ordered from the highest revenue product to the lowest.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this revenue analysis scenario to test self-join against the `products` and `transactions` tables. The focus is on how you handle the `product_name` column when building the result. It also requires HAVING for post-aggregation filtering.

---

### Break down the requirements

#### Step 1: Join `products` to `transactions`

The join connects the two tables on their shared key. This brings the columns needed for filtering and aggregation into a single row set.

#### Step 2: Aggregate by `p.product_name`

`GROUP BY p.product_name` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 3: Filter groups with HAVING

HAVING applies after GROUP BY, filtering out groups that do not meet the threshold. This cannot be done in WHERE because the aggregate has not been computed yet.

#### Step 4: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Having filter for revenue by product**

```sql
SELECT p.product_name, SUM(t.total_amount) AS total_revenue, SUM(t.quantity) AS units_sold
FROM products p
JOIN transactions t ON p.product_id = t.product_id
GROUP BY p.product_name
HAVING SUM(t.total_amount) >= 100
ORDER BY total_revenue DESC
```

> **Cost Analysis**
>
> With ~5M rows, the GROUP BY reduces the working set before any downstream operations; the join cost depends on the smaller table's cardinality. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you use HAVING (not WHERE) to filter after aggregation.

> **Common Pitfall**
>
> Putting the aggregate condition in WHERE instead of HAVING causes a syntax error. WHERE runs before GROUP BY; HAVING runs after.

---

## Common follow-up questions

- The `price` column in `products` has a 8% null rate. How does your query handle rows where `price` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `products.price` and understands how aggregates skip NULL values.)_
- With 5,000,000 distinct values in `transactions.transaction_id`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `transaction_id` affects grouping and sort operations.)_
- If `products` contained late-arriving rows that were inserted after your query ran, how would you design an incremental update instead of re-aggregating? _(Tests understanding of incremental aggregation patterns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/revenue_by_product)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.