# The Phantom Readers

> They read everything. They bought nothing.

Canonical URL: <https://datadriven.io/problems/the_phantom_readers>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The growth team wants to target window shoppers with a promotion. Find users who viewed at least 5 pages in the past 30 days but have zero transactions ever. Return their user_id, username, email, and total view count, sorted from most views to least.

## Worked solution and explanation

### What this is really asking

`p.transaction_id IS NULL` after a LEFT JOIN is the anti-join. Users with any transaction ever, even from 2019, get dropped. The 30-day window applies only to page_views, never to transactions.

---

### Break down the requirements

#### Step 1: Anti-join transactions

LEFT JOIN transactions on user_id and keep only rows where transaction_id IS NULL. That filters to users with zero transactions across the entire 80M-row table, lifetime.

#### Step 2: Restrict views to the last 30 days

Push `av.viewed_at >= date('now', '-30 days')` into the WHERE so COUNT(*) counts only recent activity. The lifetime check stays on transactions.

#### Step 3: Group, HAVING, ORDER BY

Group by user identity columns, keep groups with COUNT(*) >= 5, sort by total_views DESC. HAVING runs after WHERE, so the count reflects only the 30-day window.

---

### The solution

**WINDOW SHOPPERS WITH NO PURCHASES**

```sql
SELECT
    u.user_id,
    u.username,
    u.email,
    COUNT(*) AS total_views
FROM users u
INNER JOIN page_views av ON u.user_id = av.user_id
LEFT JOIN transactions p ON u.user_id = p.user_id
WHERE av.viewed_at >= date('now', '-30 days')
  AND p.transaction_id IS NULL
GROUP BY u.user_id, u.username, u.email
HAVING COUNT(*) >= 5
ORDER BY total_views DESC;
```

> **Cost Analysis**
>
> page_views is 500M rows partitioned on viewed_at, so the 30-day predicate prunes to ~40M scanned. The transactions LEFT JOIN can be done as NOT EXISTS to short-circuit on first hit per user, often cheaper than building the full outer-joined relation.

> **Interviewers Watch For**
>
> Whether you read 'zero transactions ever' as lifetime, not 30-day. Also whether you understand that putting `p.transaction_id IS NULL` in WHERE rather than ON is what makes the anti-join work; moving it to ON would return everyone.

> **Common Pitfall**
>
> Putting the date filter on transactions instead of page_views: `p.transaction_date >= date('now', '-30 days')`. That flips the meaning to users with no transactions in the last 30 days, which catches every dormant buyer too.

> **The False Start**
>
> First instinct is `WHERE user_id NOT IN (SELECT user_id FROM transactions)`. Looks clean, but NOT IN returns zero rows the moment one transactions.user_id is NULL because of three-valued logic. Pivot to LEFT JOIN with IS NULL, or NOT EXISTS, both of which are NULL-safe.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you rewrite the anti-join using NOT EXISTS, and when is it faster? _(NOT EXISTS short-circuits on the first matching transaction per user; LEFT JOIN materializes all matches before filtering.)_
- What if growth wants users who viewed 5 distinct pages, not 5 total views? _(Swap COUNT(*) for COUNT(DISTINCT av.page_url) in both the SELECT and HAVING.)_
- How would the query change if 'zero transactions' meant zero in the last 90 days instead of ever? _(Move the transactions filter into the ON clause as `p.transaction_date >= date('now', '-90 days')`, keeping IS NULL in WHERE.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_phantom_readers)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.