# Revenue Per Product With Zeros

> Total revenue per product. Even the zeros.

Canonical URL: <https://datadriven.io/problems/revenue_per_product_with_zeros>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Build a product revenue report where every product appears in the results, including products that have never been sold. Products with no sales should show zero revenue.

## Worked solution and explanation

### Why this problem exists in real interviews

The core skill being tested is self-join, applied to the `products` and `transactions` tables in a revenue analysis context. Getting the `product_id` and `product_name` columns right is where most candidates slip. The problem layers in COALESCE for NULL fallback as well.

---

### Break down the requirements

#### Step 1: Left join to preserve all base rows

A `LEFT JOIN` from `products` ensures every row appears in the output even if there is no match in `transactions`. Missing values become NULL.

#### Step 2: Aggregate by `p.product_id`

`GROUP BY p.product_id, p.product_name` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

---

### The solution

**Left join to preserve all base rows to find revenue per product wit...**

```sql
SELECT p.product_id, p.product_name, COALESCE(SUM(p.price * t.quantity), 0) AS total_revenue
FROM products p
LEFT
JOIN transactions t ON p.product_id = t.product_id
GROUP BY p.product_id, p.product_name
```

> **Cost Analysis**
>
> With ~60M rows, the GROUP BY reduces the working set before any downstream operations; the join cost depends on the smaller table's cardinality. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you choose the correct join type to avoid silently dropping rows; how you handle NULL values and whether you account for them in filters and aggregations.

> **Common Pitfall**
>
> Using INNER JOIN instead of LEFT JOIN drops rows with no match, producing an incomplete result. The prompt usually hints at this with 'all' or 'even if no'.

---

## Common follow-up questions

- What result would you get if every value in `products.rating` were NULL? Would your query return an empty set or something unexpected? _(Tests extreme NULL scenarios and whether the candidate guards against edge cases in `rating`.)_
- With 3,000,000 distinct values in `transactions.user_id`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `user_id` affects grouping and sort operations.)_
- Your COALESCE provides a fallback value. If `product_id` in `transactions` is never NULL in practice, does the COALESCE add overhead or get optimized away? _(Tests understanding of COALESCE cost and optimizer behavior.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/revenue_per_product_with_zeros)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.