# Full Funnel

> Search. Browse. Buy. Only a few do all three.

Canonical URL: <https://datadriven.io/problems/full_funnel>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

The funnel team is building a "full lifecycle" segment of users who engaged at every stage: searching, browsing, and purchasing. Return a deduplicated list of usernames for users who have activity in all three areas.

## Worked solution and explanation

### Why this problem exists in real interviews

Full funnel analysis (signup to search to view to purchase) tests multi-table joins and the ability to identify users who engaged at every stage. This is a signature data engineering interview problem.

> **Trick to Solving**
>
> The prompt asks for users who engaged at every stage. Use INNER JOINs (not LEFT JOINs) across all four tables to find users present in all stages.
> 
> 1. Start from users
> 2. INNER JOIN each stage table on user_id
> 3. Users surviving all joins are the full-funnel segment

---

### Break down the requirements

#### Step 1: Join all four stage tables

INNER JOIN `users` to `search_queries`, `page_views`, and `transactions` on `user_id`. Only users present in all tables survive.

#### Step 2: Deduplicate to unique users

Use `DISTINCT` or `GROUP BY user_id` since each join may fan out.

#### Step 3: Return the full-funnel users

Select user details for the users who engaged at every stage.

---

### The solution

**Multi-table INNER JOIN for full-funnel users**

```sql
SELECT DISTINCT u.user_id, u.username, u.email, u.signup_date
FROM users u
JOIN search_queries sq ON u.user_id = sq.user_id
JOIN page_views pv ON u.user_id = pv.user_id
JOIN transactions t ON u.user_id = t.user_id
ORDER BY u.user_id
```

> **Cost Analysis**
>
> Four INNER JOINs can produce a large intermediate set. Using EXISTS subqueries instead of JOINs avoids the Cartesian fan-out and is often faster.

> **Interviewers Watch For**
>
> The interviewer checks two things: (1) using INNER JOIN (not LEFT JOIN) since the prompt wants users at every stage, and (2) using DISTINCT to handle the fan-out from multiple joins.

> **Common Pitfall**
>
> Without DISTINCT or GROUP BY, the join produces a row for every combination of search, view, and transaction per user, massively inflating the result.

---

## Common follow-up questions

- How would you compute the conversion rate at each stage? _(Tests switching to LEFT JOINs and counting NULLs.)_
- What if stages must happen in chronological order? _(Tests timestamp ordering constraints in the join conditions.)_
- How would you find the biggest drop-off point? _(Tests per-stage count comparison.)_
- What if the joins cause performance issues? _(Tests EXISTS-based approach or CTE per stage.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/full_funnel)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.