# The Cannibalization Report

> The new product launched. The old one suffered.

Canonical URL: <https://datadriven.io/problems/the_cannibalization_report>

Domain: SQL · Difficulty: hard · Seniority: L6

## Problem

For each product whose first transaction occurred in the last 6 months, compare the category's total sales in the 30 days before that product's first sale versus 30 days after. Show the product name, category, pre-entry sales, post-entry sales, and the percentage change.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a complex analytical query that tests date windowing, self-referential comparisons, and percentage calculations. Interviewers use it to see if you can compute time-relative aggregates (30-day windows around an event), join across different grains (product vs. category), and handle the before/after comparison pattern that appears in A/B testing and launch analysis.

> **Trick to Solving**
>
> When the prompt asks for "before vs. after" around an event date, the pattern is always the same: identify the event date, build two date ranges relative to it, and aggregate each independently.
> 
> 1. Find each product's first transaction date (launch proxy)
> 2. Filter to products launched in the last 6 months
> 3. Compute category sales for `[launch - 30, launch)` and `[launch, launch + 30]`
> 4. Calculate percentage change: `(post - pre) / pre * 100`

---

### Break down the requirements

#### Step 1: Find each product's launch date

`MIN(transaction_date)` per `product_id` from the `transactions` table gives the first sale date, which the prompt uses as the launch proxy.

#### Step 2: Filter to recent launches

Keep only products whose first transaction is within the last 6 months. This narrows the output to the relevant subset.

#### Step 3: Compute pre-launch category sales

For each launched product, sum `total_amount` across all transactions in the same `category` where `transaction_date` falls in the 30 days before launch. This requires joining `products` to `transactions` on category.

#### Step 4: Compute post-launch category sales

Same aggregation but for the 30 days starting from launch date. Both windows exclude the launched product itself if you want pure cannibalization; the prompt implies including all category sales.

#### Step 5: Calculate percentage change

`(post_sales - pre_sales) / pre_sales * 100` gives the cannibalization metric. Handle division by zero with `NULLIF`.

---

### The solution

**Date-windowed category comparison with CTEs**

```sql
WITH launches AS (
    SELECT
        t.product_id,
        p.product_name,
        p.category,
        MIN(t.transaction_date) AS launch_date
    FROM transactions t
    JOIN products p ON t.product_id = p.product_id
    GROUP BY t.product_id, p.product_name, p.category
    HAVING MIN(t.transaction_date) >= DATE('now', '-6 months')
),
pre_sales AS (
    SELECT
        l.product_id,
        SUM(t.total_amount) AS pre_amount
    FROM launches l
    JOIN transactions t
        ON t.product_id IN (SELECT product_id FROM products WHERE category = l.category)
    WHERE t.transaction_date >= DATE(l.launch_date, '-30 days')
      AND t.transaction_date < l.launch_date
    GROUP BY l.product_id
),
post_sales AS (
    SELECT
        l.product_id,
        SUM(t.total_amount) AS post_amount
    FROM launches l
    JOIN transactions t
        ON t.product_id IN (SELECT product_id FROM products WHERE category = l.category)
    WHERE t.transaction_date >= l.launch_date
      AND t.transaction_date < DATE(l.launch_date, '+30 days')
    GROUP BY l.product_id
)
SELECT
    l.product_name,
    l.category,
    COALESCE(pre.pre_amount, 0) AS pre_launch_sales,
    COALESCE(post.post_amount, 0) AS post_launch_sales,
    ROUND(
        (COALESCE(post.post_amount, 0) - COALESCE(pre.pre_amount, 0))
        * 100.0 / NULLIF(COALESCE(pre.pre_amount, 0), 0),
        2
    ) AS pct_change
FROM launches l
LEFT JOIN pre_sales pre ON l.product_id = pre.product_id
LEFT JOIN post_sales post ON l.product_id = post.product_id
ORDER BY pct_change DESC
```

> **Cost Analysis**
>
> The `launches` CTE scans 120M transactions once. The pre/post CTEs each do a correlated join against `transactions` filtered by date windows. On a 120M-row table, indexing `transactions(category, transaction_date, total_amount)` is critical. Without it, each CTE triggers a near-full scan.

> **Interviewers Watch For**
>
> Interviewers look for clean separation of the launch identification, pre-window, and post-window logic. Candidates who try to do everything in one pass usually produce incorrect date boundary logic. Using CTEs to isolate each concern shows structured thinking.

> **Common Pitfall**
>
> Forgetting to use `NULLIF` on the denominator when computing percentage change. If a product's category had zero pre-launch sales, you get a division-by-zero error. Another common error is using inclusive boundaries on both sides of the date range, double-counting the launch date.

---

## Common follow-up questions

- How would you exclude the launched product's own sales from the category totals? _(Tests adding a WHERE clause to filter out the product_id being analyzed.)_
- What if you needed a rolling 7-day granularity instead of a single 30-day window? _(Probes date series generation and windowed aggregation at finer granularity.)_
- How would you handle products that launched on the same day in the same category? _(Tests overlapping window logic and whether pre/post windows conflict.)_
- What if the transactions table had 10B rows and this query needs to run daily? _(Probes materialized view, incremental computation, or partition-level pre-aggregation strategies.)_
- How would you measure statistical significance of the percentage change? _(Tests awareness of sample size, variance, and when a percentage change is meaningful vs. noise.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_cannibalization_report)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.