# Category Sales Summary

> Category by category. How did they do?

Canonical URL: <https://datadriven.io/problems/category_sales_summary>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The merchandising team needs a 2026 sales overview by product category. For each category with at least one sale, show the number of unique transactions and total revenue, sorted from highest revenue to lowest.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests a straightforward JOIN, GROUP BY, and aggregate query with date filtering. It verifies that candidates can filter by year, count distinct values, and sort results correctly.

---

### Break down the requirements

#### Step 1: Join and filter by year

Join `transactions` to `products` on `product_id`. Filter `transaction_date` to the target year.

#### Step 2: Aggregate per category

`COUNT(DISTINCT t.transaction_id)` for unique transactions and `SUM(t.total_amount)` for total revenue, grouped by `p.category`.

#### Step 3: Sort by revenue descending

`ORDER BY total_revenue DESC` surfaces the highest-revenue categories first. The `HAVING` is implicit: categories with at least one sale naturally appear.

---

### The solution

**Year-filtered category aggregation**

```sql
SELECT
    p.category,
    COUNT(DISTINCT t.transaction_id) AS unique_transactions,
    SUM(t.total_amount) AS total_revenue
FROM transactions t
JOIN products p ON t.product_id = p.product_id
WHERE t.transaction_date >= '2026-01-01'
  AND t.transaction_date < '2027-01-01'
GROUP BY p.category
ORDER BY total_revenue DESC
```

> **Cost Analysis**
>
> The date filter reduces 90M rows to roughly one year's worth (~7.5M). The join to 35K products is cheap via hash join. Output is a handful of categories.

> **Common Pitfall**
>
> Using `YEAR(transaction_date) = 2026` instead of range bounds prevents index usage on the date column. Always use range predicates for date filtering.

---

## Common follow-up questions

- Why use COUNT(DISTINCT transaction_id) instead of COUNT(*)? _(If the join produces duplicates (multiple products per transaction), COUNT(*) would overcount.)_
- How would you include categories with zero sales in the target year? _(Tests LEFT JOIN from products grouped by category.)_
- What if the date column is a timestamp with time zone? _(Tests awareness of timezone-aware date comparisons and AT TIME ZONE.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/category_sales_summary)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.