# Funnel Leakage Report

> Users enter the funnel. Most never reach the bottom.

Canonical URL: <https://datadriven.io/problems/funnel_leakage_report>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

The conversion optimization team tracks a four-step purchase funnel: page_view, add_to_cart, checkout_start, and purchase. For each step, show how many unique users reached it and the drop-off percentage from the previous step. Break the analysis down by tag.

## Worked solution and explanation

### Why this problem exists in real interviews

Funnel leakage analysis identifies where users drop off between conversion steps. This tests per-step user counting and step-over-step comparison using `LAG` or conditional aggregation on event types.

> **Trick to Solving**
>
> Count distinct users at each funnel stage using conditional aggregation, then compute leakage as the difference between consecutive stages.
> 
> 1. Count users per stage with `COUNT(DISTINCT CASE WHEN event_type = '...' THEN user_id END)`
> 2. Compare adjacent stages to find the drop-off
> 3. Express as absolute and percentage leakage

---

### Break down the requirements

#### Step 1: Count users at each stage

Use conditional `COUNT(DISTINCT)` for each of the four stages: page_view, add_to_cart, checkout_start, purchase.

#### Step 2: Compute step-over-step leakage

Subtract each stage count from the previous to get the drop-off at each step.

---

### The solution

**Conditional COUNT DISTINCT for funnel stages**

```sql
SELECT
    COUNT(DISTINCT CASE WHEN event_type = 'page_view' THEN user_id END) AS stage_1_view,
    COUNT(DISTINCT CASE WHEN event_type = 'add_to_cart' THEN user_id END) AS stage_2_cart,
    COUNT(DISTINCT CASE WHEN event_type = 'checkout_start' THEN user_id END) AS stage_3_checkout,
    COUNT(DISTINCT CASE WHEN event_type = 'purchase' THEN user_id END) AS stage_4_purchase
FROM event_data
WHERE event_type IN ('page_view', 'add_to_cart', 'checkout_start', 'purchase')
```

> **Cost Analysis**
>
> Single scan with four conditional distinct counts. The WHERE filter reduces the working set to relevant event types.

> **Interviewers Watch For**
>
> The interviewer checks whether you present leakage as both absolute numbers and percentages.

> **Common Pitfall**
>
> Counting events (COUNT) instead of unique users (COUNT DISTINCT) inflates the numbers. A user with 10 page views should count once at the view stage.

---

## Common follow-up questions

- How would you break leakage down by user segment? _(Tests adding a segment join.)_
- What if stages must happen in order? _(Tests timestamp-based sequential filtering.)_
- How would you track leakage trends weekly? _(Tests date truncation in GROUP BY.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/funnel_leakage_report)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.