# Product Revenue Ranking

> Rank them by revenue. See who leads.

Canonical URL: <https://datadriven.io/problems/product_revenue_ranking>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The merchandising team is selecting the top 5 revenue drivers for a homepage feature. Show the product name, category, and total revenue for each, sorted from highest to lowest.

## Worked solution and explanation

### Why this problem exists in real interviews

Built around the `products` and `transactions` tables, this challenge probes your ability to apply self-join in a revenue analysis setting. Correctly referencing columns like `product_id`, `product_name`, and `category` is essential to a working solution.

---

### Break down the requirements

#### Step 1: Join products to transactions

`JOIN transactions ON products.product_id = transactions.product_id` links each product to its sales.

#### Step 2: Group by product and sum revenue

`GROUP BY product_id, product_name, category` with `SUM(total_amount)` computes total revenue.

#### Step 3: Order and limit to top 5

`ORDER BY total_revenue DESC LIMIT 5` returns the five highest-revenue products.

---

### The solution

**Join products to transactions to find product revenue ranking**

```sql
SELECT p.product_name, p.category, SUM(t.total_amount) AS total_revenue
FROM products p
JOIN transactions t ON p.product_id = t.product_id
GROUP BY p.product_id, p.product_name, p.category
ORDER BY total_revenue DESC
LIMIT 5
```

> **Cost Analysis**
>
> With `products` (10,000 rows), `transactions` (50,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- The `quantity` column in `transactions` has a 3% null rate. How does your query handle rows where `quantity` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `transactions.quantity` and understands how aggregates skip NULL values.)_
- With 50,000,000 distinct values in `transactions.transaction_id`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `transaction_id` affects grouping and sort operations.)_
- If `products` contained late-arriving rows that were inserted after your query ran, how would you design an incremental update instead of re-aggregating? _(Tests understanding of incremental aggregation patterns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/product_revenue_ranking)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.