# Peak Non-Converting Month

> Everyone showed up. Nobody bought anything.

Canonical URL: <https://datadriven.io/problems/peak_non_converting_month>

Domain: SQL · Difficulty: easy · Seniority: L4

## Problem

Which month in 2026 had the most users who had sessions but never made a purchase? Show the month and the count of non-converting users.

## Worked solution and explanation

### Why this problem exists in real interviews

Growth teams chase the month with the most browsers who never bought, because that is where retargeting dollars have the highest ROI. Interviewers use this prompt to see whether you can express 'never appears in transactions' as a NOT IN anti-join, group by month with strftime, and pick the single peak month with a deterministic tie-break.

---

### Break down the requirements

#### Step 1: Define non-converting users with a NOT IN anti-join

user_id NOT IN (SELECT DISTINCT user_id FROM transactions WHERE user_id IS NOT NULL) keeps only users who never bought. The WHERE user_id IS NOT NULL inside the subquery is critical: NOT IN returns NULL for every outer row if the subquery yields a single NULL, which silently empties the result.

#### Step 2: Bucket sessions by year-month and count distinct users

strftime('%Y-%m', session_start) AS session_month groups sessions into calendar months. COUNT(DISTINCT user_id) per month avoids counting a user multiple times within the same month.

#### Step 3: Pick the peak month with a chronological tie-break

ORDER BY non_converting_users DESC, session_month ASC LIMIT 1 returns the highest-count month, and ties resolve by earliest month. Without the secondary sort, ties produce nondeterministic output.

---

### The solution

**Anti-join, monthly count distinct, peak with tie-break**

```sql
SELECT strftime('%Y-%m', us.session_start) AS session_month, COUNT(DISTINCT us.user_id) AS non_converting_users FROM user_sessions us WHERE strftime('%Y', us.session_start) = '2026' AND us.user_id NOT IN (SELECT DISTINCT user_id FROM transactions WHERE user_id IS NOT NULL) GROUP BY session_month ORDER BY non_converting_users DESC, session_month ASC LIMIT 1
```

> **Cost Analysis**
>
> user_sessions has 50,000,000 rows and transactions has 80,000,000. The subquery materializes the distinct converting user_ids (likely tens of millions of distinct values). NOT IN against that set can be slow; on a real production query you would rewrite as LEFT JOIN ... WHERE t.user_id IS NULL or NOT EXISTS, both of which let the planner stream rather than build a hash set. At interview scale, NOT IN is acceptable as long as the NULL filter is in place.

> **Interviewers Watch For**
>
> They want WHERE user_id IS NOT NULL inside the NOT IN subquery (the NULL trap is the single most-tested SQL gotcha), they want COUNT(DISTINCT user_id) (not COUNT(*)), and they want both the year scope strftime('%Y', session_start) = '2026' and the explicit chronological tie-break. Strong candidates also propose NOT EXISTS or LEFT JOIN ... IS NULL as more performant alternatives.

> **Common Pitfall**
>
> Dropping the user_id IS NOT NULL guard inside the NOT IN subquery returns zero rows whenever transactions has any NULL user_id, because NOT IN propagates NULL through three-valued logic. Counting with COUNT(*) instead of COUNT(DISTINCT user_id) inflates months where heavy non-converters had many sessions.

---

## Common follow-up questions

- Why is NOT EXISTS often faster than NOT IN here? _(Tests three-valued logic intuition. NOT EXISTS short-circuits per outer row and is immune to NULLs in the subquery, while NOT IN must materialize the entire subquery and is poisoned by a single NULL.)_
- How would you adapt this to per-quarter instead of per-month? _(strftime cannot return quarters directly in SQLite. The candidate should compute (CAST(strftime('%m', session_start) AS INTEGER) - 1) / 3 + 1 and concatenate with the year.)_
- What if a user converted in 2027 but only browsed in 2026? _(The current query treats them as converting (NOT IN drops them entirely). The candidate should propose adding a date filter to the transactions subquery so conversion is scoped to the same year.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/peak_non_converting_month)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.