# Category Deep Dive

> Revenue, units, rank. The full category report card.

Canonical URL: <https://datadriven.io/problems/category_deep_dive>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

Strategy wants a full financial picture per product category. For each category, compute total revenue (sum of transaction amounts) and total units sold (sum of quantities). Assign each category a position by revenue, highest first, with ties sharing a position. Return the category, revenue, units, and position.

## Worked solution and explanation

### Why this problem exists in real interviews

This combines JOIN aggregation with window functions. The "assign each category a position by revenue with ties sharing a position" phrase tests whether you know the difference between `RANK`, `DENSE_RANK`, and `ROW_NUMBER` and can apply one inside an aggregate query.

> **Trick to Solving**
>
> "Ties sharing a position" is the signal for `RANK()` or `DENSE_RANK()`. Since the prompt says "position" (implying gaps are acceptable), `RANK()` is the natural choice. Spot this by looking for language about ties or shared positions.
> 
> 1. Join and aggregate revenue and units per category
> 2. Apply `RANK() OVER (ORDER BY revenue DESC)` to assign positions
> 3. No subquery needed since the window function runs after GROUP BY

---

### Break down the requirements

#### Step 1: Join products to transactions

Join on `product_id` to associate `category` with transaction amounts and quantities.

#### Step 2: Aggregate per category

`SUM(t.total_amount)` for revenue and `SUM(t.quantity)` for total units sold, grouped by `p.category`.

#### Step 3: Rank by revenue

`RANK() OVER (ORDER BY SUM(t.total_amount) DESC)` assigns positions with ties sharing the same rank.

---

### The solution

**Aggregate with inline window ranking**

```sql
SELECT
    p.category,
    SUM(t.total_amount) AS total_revenue,
    SUM(t.quantity) AS total_units,
    RANK() OVER (ORDER BY SUM(t.total_amount) DESC) AS position
FROM products p
JOIN transactions t ON p.product_id = t.product_id
GROUP BY p.category
```

> **Cost Analysis**
>
> Hash join of 50K products to 250M transactions, then GROUP BY reduces to ~30 categories. The RANK window function sorts ~30 rows, which is negligible. The bottleneck is scanning 250M transaction rows.

> **Interviewers Watch For**
>
> Whether you place the window function directly in the SELECT of the GROUP BY query (correct) vs wrapping in an unnecessary subquery. SQL evaluates window functions after GROUP BY, so this works in a single level.

> **Common Pitfall**
>
> Using `ROW_NUMBER()` instead of `RANK()` would assign unique positions even for tied revenue categories, violating the requirement that ties share a position.

---

## Common follow-up questions

- What is the difference between RANK and DENSE_RANK here? _(RANK skips numbers after ties (1,1,3); DENSE_RANK does not (1,1,2). Tests precise understanding.)_
- What if categories with no transactions should appear with zero revenue? _(Tests LEFT JOIN from products and COALESCE for NULL sums.)_
- How would you show only the top 3 positions including all ties? _(Wrap in a subquery and filter WHERE position <= 3.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/category_deep_dive)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.