# Unsold Product Categories

> Dead inventory inflating storage costs.

Canonical URL: <https://datadriven.io/problems/unsold_product_categories>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Dead inventory inflates storage costs. For each product category, calculate the percentage of products in that category that have never appeared in any transaction.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests LEFT JOIN anti-patterns combined with percentage calculation per group. Interviewers check whether you can compute a ratio of unmatched records within each category.

---

### Break down the requirements

#### Step 1: Left join products to transactions

`LEFT JOIN transactions ON products.product_id = transactions.product_id` preserves all products, including those with no sales.

#### Step 2: Identify unsold products

After the left join, products with `transaction_id IS NULL` have never been sold.

#### Step 3: Calculate percentage per category

Group by category and compute `COUNT(unsold) * 100.0 / COUNT(all)` for the unsold percentage.

---

### The solution

**Left join with conditional aggregation for unsold ratio**

```sql
SELECT
    p.category,
    SUM(CASE WHEN t.transaction_id IS NULL THEN 1 ELSE 0 END) * 100.0 / COUNT(DISTINCT p.product_id) AS unsold_pct
FROM products p
LEFT JOIN (
    SELECT DISTINCT product_id FROM transactions
) t ON p.product_id = t.product_id
GROUP BY p.category
```

> **Cost Analysis**
>
> The subquery deduplicates 60M transactions to 18K distinct product IDs. The left join is then 20K products against 18K distinct IDs, which is trivial. The key optimization is deduplicating transactions first.

> **Interviewers Watch For**
>
> Candidates who join products directly to the 60M-row transactions table without deduplication. This creates massive fan-out and incorrect percentages because sold products would be counted multiple times.

> **Common Pitfall**
>
> Not deduplicating transactions before the join. A product with 1,000 transactions would contribute 1,000 rows to the join, skewing the COUNT and the percentage calculation.

---

## Common follow-up questions

- How would you also show the total number of products per category? _(Add COUNT(DISTINCT p.product_id) as a separate column in the same query.)_
- What if you needed to consider only transactions from the last year? _(Add a date filter inside the subquery before deduplication.)_
- How would you find the category with the highest unsold percentage? _(Wrap in an outer query with ORDER BY unsold_pct DESC LIMIT 1.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/unsold_product_categories)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.