# 7-Day Onboarding Conversion

> Signed up Monday. Still here by Sunday?

Canonical URL: <https://datadriven.io/problems/7_day_onboarding_conversion>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

The growth team is evaluating early onboarding for a board review. For users who entered an experiment during the first 7 days of January 2026, determine what percentage completed at least one meaningful session (duration greater than zero) within 7 days of entering the experiment. Break the results down by platform and experiment entry date, showing total users, converted users, and the conversion percentage.

## Worked solution and explanation

### The mental model

Cohort questions like this probe one thing: do you reflexively separate the **population you're measuring** from the **behavior you're scoring**? Growth teams reject candidates who write a single CTE that double-counts users with many sessions, or who silently drop users with zero sessions from the denominator. The interviewer wants to hear 'cohort first, behavior second' before you touch the keyboard.

---

### The three traps

#### Step 1: Trap 1: skipping the user-level GROUP BY

If you aggregate platform+date directly off the join (`GROUP BY platform, signup_date` over the joined rows) you count session rows, not users. A user with 5 sessions counts 5 times; a user with 0 sessions counts 0 times. Your conversion rate becomes 'session conversion rate', not 'user conversion rate', and you cannot recover the right number without a redo. **Fix: GROUP BY user first to collapse to one row per cohort member, then GROUP BY the cohort dimensions.**

#### Step 2: Trap 2: WHERE kills the LEFT JOIN

If the 7-day window lives in WHERE (`WHERE us.session_start BETWEEN s.signup_date AND s.signup_date + 7`), the LEFT JOIN silently becomes an INNER JOIN. Users with zero sessions vanish, the denominator shrinks, and the conversion rate jumps. The same trap shows up in churn, retention, and post-event funnels. **Fix: window predicates always go in ON when LEFT JOIN preserves the left side.**

#### Step 3: Trap 3: counting zero-duration sessions

A session row with `session_duration_sec = 0` is a connection event, not engagement. If you count any session as conversion you inflate the rate. The prompt is explicit ('duration greater than zero') but the trap is that NULL durations should also not count. `COUNT(CASE WHEN dur > 0 THEN 1 END)` handles both because the CASE returns NULL on no-match, and COUNT skips NULLs. **Avoid SUM(CASE WHEN dur > 0 THEN 1 ELSE 0 END) here; you want the 0/1 collapse outside, not inside.**

---

### The solution

**Three-act cohort conversion**

```sql
WITH signups AS (
    SELECT user_id, platform, DATE(created) AS signup_date
    FROM experiments
    WHERE DATE(created) BETWEEN '2026-01-01' AND '2026-01-07'
),
engagement AS (
    SELECT
        s.platform,
        s.signup_date,
        s.user_id,
        CASE
            WHEN COUNT(CASE WHEN us.session_duration_sec > 0 THEN 1 END) > 0
            THEN 1 ELSE 0
        END AS converted
    FROM signups s
    LEFT JOIN user_sessions us
        ON s.user_id = us.user_id
       AND julianday(us.session_start) - julianday(s.signup_date) BETWEEN 0 AND 7
    GROUP BY s.platform, s.signup_date, s.user_id
)
SELECT
    platform,
    signup_date,
    COUNT(*) AS total_users,
    SUM(converted) AS converted_users,
    CAST(SUM(converted) AS REAL) / CAST(COUNT(*) AS REAL) * 100 AS conversion_rate
FROM engagement
GROUP BY platform, signup_date
ORDER BY platform, signup_date
```

> **The 0/1 collapse generalizes**
>
> The 0/1 collapse pattern is the elegant move. Once `converted` is a per-user 0 or 1, `SUM(converted)` counts converters and `COUNT(*)` counts cohort members. The conversion rate is then `SUM/COUNT`, which is the same shape you write for click-through rate, opt-in rate, retention rate, and a dozen other funnel metrics. Learn this shape and you stop reinventing it.

> **Cost shape and the partition-pruning fix**
>
> Two layers of GROUP BY: `experiments` is filtered to 7 days of signups, so the cohort CTE is small (thousands of users). The big work is the LEFT JOIN against `user_sessions` (50M rows). The window predicate `julianday(us.session_start) - julianday(s.signup_date) BETWEEN 0 AND 7` applies a function on session_start which defeats partition pruning. In production rewrite as `us.session_start >= s.signup_date AND us.session_start < DATE(s.signup_date, '+8 days')`. That form lets the planner range-seek the session_start partition and changes a full scan into a 14-day slice (one week of signups plus one week of windows).

> **What interviewers actually score**
>
> Three things separate strong candidates here. (1) They name the cohort out loud before writing SQL: 'users who entered Jan 1 to Jan 7, fixed at that population.' (2) They draw the GROUP BY layers on paper before they type, often saying 'first by user, then by platform+date.' (3) They handle the edge case proactively: 'what about users who signed up but never sessioned? They are in the denominator with converted = 0.' The candidates who skip those steps usually write a one-CTE query that double-counts sessions and produce a number that looks plausible but is wrong.

> **The seductive wrong answer**
>
> The most common wrong query: a single CTE that joins experiments to sessions, filters the window in WHERE, then groups by platform+date and counts distinct converters over distinct users. It produces a number, but the denominator is wrong (it excludes users with zero sessions because WHERE collapsed the LEFT JOIN) and the numerator is risky (COUNT DISTINCT user_id WHERE dur > 0 is fragile when you later add other filters). The three-act CTE structure is verbose, but it makes every assumption auditable.

---

## Common follow-up questions

- A user signed up Jan 3 but did not session until Feb 1 (way outside the 7-day window). Are they in your denominator? Should they be? _(Tests whether the candidate sees the cohort as fixed independent of behavior. Correct answer: no, the user counts against the cohort with converted = 0; we measured what we measured, the denominator does not retroactively shrink. This is the difference between cohort analysis and population analysis.)_
- If I move the 7-day window predicate from ON into WHERE, which rows disappear and how does the reported conversion rate change? _(Tests the WHERE-vs-ON LEFT JOIN trap. Walking through it: the LEFT JOIN produces NULL session_start for unmatched users; the WHERE clause then filters those NULLs out; the user vanishes; the denominator shrinks; the rate goes up. This is the single most common bug in retention SQL.)_
- Why does the query need two layers of GROUP BY? What goes wrong if I aggregate platform+date directly off the join? _(Tests the GROUP-BY-grain question. Without the per-user GROUP BY, a user with 10 sessions counts 10 times in the cohort total, but their conversion flag is still 1, so the rate plummets. The candidate should explain that GROUP BY user collapses to one-row-per-user before the outer aggregation.)_
- Is a session on day 7 (exactly 7 days after signup) inside or outside the window? How would you check? _(Tests the boundary inclusivity question. BETWEEN 0 AND 7 is 8 days (day 0 through day 7 inclusive); strict `>= 0 AND < 7` is 7 days. The prompt says 'within 7 days of entering' which most readers interpret as 'up to and including day 7'. Flagging the ambiguity out loud beats picking silently.)_
- iOS sessions log in the device timezone, Android in UTC. How would that affect the conversion rate, and how would you fix it? _(Tests the timezone scenario. signup_date is derived from `created`, session_start is a separate timestamp; if these are stored in different timezones the windows are misaligned by hours. Real production answer: normalize both to UTC at write time; do the window math against UTC; surface local time only in the presentation layer.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/7_day_onboarding_conversion)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.