# Viewer-to-Purchaser Activity

> Started as viewers. Became creators.

Canonical URL: <https://datadriven.io/problems/viewer_to_purchaser_activity>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

For every user whose very first event was a 'page_view', count how many 'purchase' events they went on to produce. Include users with zero purchases. Return the user ID and purchase count, from highest purchase count down, then user_id ascending.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests window functions for per-user first-event identification combined with conditional aggregation. Interviewers check whether you can define a cohort by first event type and then count subsequent events of a different type.

> **Trick to Solving**
>
> The trick is identifying each user's first event. Use `ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_timestamp)` to find the first event, then filter to users whose first event was 'page_view'.
> 
> 1. Rank events per user by timestamp
> 2. Filter to users whose rank-1 event is 'page_view'
> 3. Count purchase events for those users
> 4. Include users with zero purchases using LEFT JOIN or conditional count

---

### Break down the requirements

#### Step 1: Find each user's first event type

Use a CTE with `ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_timestamp)` and filter to `rn = 1` where `event_type = 'page_view'`.

#### Step 2: Count purchase events for the cohort

Left join the cohort users back to `event_data` filtered to `event_type = 'purchase'` and count.

#### Step 3: Include users with zero purchases

The LEFT JOIN ensures users who never purchased still appear with a count of 0 via COALESCE.

---

### The solution

**Cohort definition by first event then conditional count**

```sql
WITH first_events AS (
    SELECT user_id, event_type,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_timestamp) AS rn
    FROM event_data
),
viewer_cohort AS (
    SELECT user_id
    FROM first_events
    WHERE rn = 1 AND event_type = 'page_view'
)
SELECT
    vc.user_id,
    COUNT(e.event_id) AS purchase_count
FROM viewer_cohort vc
LEFT JOIN event_data e
    ON vc.user_id = e.user_id AND e.event_type = 'purchase'
GROUP BY vc.user_id
ORDER BY purchase_count DESC, vc.user_id ASC
```

> **Cost Analysis**
>
> The first CTE sorts 250M rows across 12M user partitions. This is the dominant cost. The cohort filter reduces to a subset of users. The LEFT JOIN to count purchases scans the event table again but only for matching user_ids.

> **Interviewers Watch For**
>
> Handling the zero-purchase case. The prompt says "include users with zero purchases," requiring a LEFT JOIN or COALESCE pattern.

> **Common Pitfall**
>
> Using `MIN(event_timestamp)` to find the first event but forgetting to also retrieve the event_type at that timestamp. A subquery join on (user_id, min_timestamp) is needed, or use ROW_NUMBER as shown.

---

## Common follow-up questions

- What if two events have the same timestamp for a user? _(ROW_NUMBER is non-deterministic; add event_id as a tiebreaker.)_
- How would you compute the conversion rate (purchasers / total cohort)? _(Count users with purchase_count > 0 divided by total cohort size.)_
- What if you needed to analyze the time between first view and first purchase? _(Join first_view and first_purchase timestamps per user and compute the difference.)_
- How would you generalize this to any first-event-type cohort? _(Parameterize the event_type filter in the cohort CTE.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/viewer_to_purchaser_activity)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.