# Monthly Category Totals

> Sum amounts by category and month.

Canonical URL: <https://datadriven.io/problems/monthly_category_totals>

Domain: SQL · Difficulty: easy · Seniority: L5

## Problem

Ahead of the holiday planning cycle, the merchandising team needs to see how each product category's revenue shifts month to month. Show the total transaction amount for each category-month combination, ordered by category then month.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge forces you to reconcile transactions.total_amount with products.category through grouping and date extraction. It surfaces as a fundamentals check because candidates who skip the grain check produce plausible but incorrect numbers.

---

### Break down the requirements

#### Step 1: Join tables with INNER JOIN

Connect `transactions` and `products` on `product_id` to keep only matching rows.

#### Step 2: Aggregate with SUM

Group by the output grain and apply `SUM()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 3: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**Join with two-level GROUP BY for category-month grid**

```sql
SELECT p.category,
    STRFTIME('%Y-%m', t.transaction_date) AS month,
    SUM(t.total_amount) AS total_revenue
FROM transactions t
JOIN products p ON t.product_id = p.product_id
GROUP BY p.category, STRFTIME('%Y-%m', t.transaction_date)
ORDER BY p.category, month
```

> **Cost Analysis**
>
> The join touches `transactions` (80M rows) and `products` (25K rows). `transactions` is partitioned by `transaction_date`, which the optimizer can exploit with a partition filter. The aggregation reduces the row count before any downstream processing, which is the key performance lever. CTEs in most engines are optimization fences. For production workloads, consider inlining or materializing the intermediate results.

> **Interviewers Watch For**
>
> Interviewers expect you to articulate why you chose a specific join type and what happens to unmatched rows. Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax.

> **Common Pitfall**
>
> Forgetting that a JOIN can multiply rows when the relationship is one-to-many. Always check whether the join key is unique on at least one side.

---

## Common follow-up questions

- If transactions.transaction_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in transaction_id.)_
- If the join between transactions and products produces a fan-out, how does that affect your aggregate? _(Tests awareness of join cardinality and its impact on SUM, COUNT, and AVG results.)_
- With millions of distinct values in transactions.transaction_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like transaction_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/monthly_category_totals)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.