# Feature Flag Engagement Impact

> Flags on versus flags off. The engagement gap.

Canonical URL: <https://datadriven.io/problems/feature_flag_engagement_impact>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

The growth team wants to understand session activity around feature flag launches. For each feature flag, return the flag name, a display name (underscores replaced with spaces), whether the flag is currently enabled, and how many user sessions started in the 30 days following the flag's created date. Flags with zero qualifying sessions should still appear with a session count of zero. Sort by flag name.

## Worked solution and explanation

### The mental model

Experimentation teams use this to check one habit: do you treat a date-bounded inequality as a JOIN condition or a WHERE filter? Both compute different answers, and only one preserves flags that had zero qualifying sessions. The interviewer is watching whether you reach for `LEFT JOIN ... ON DATE` before they have to ask, and whether you say `COUNT(us.session_id)` instead of `COUNT(*)` without prompting.

---

### The two traps that will burn you

#### Step 1: Trap 1: WHERE kills your LEFT JOIN

If you write the window filter in WHERE (`WHERE us.session_start >= ff.created AND us.session_start < ...`), the LEFT JOIN silently becomes an INNER JOIN. SQL evaluates ON first, producing NULL rows for unmatched flags, then WHERE filters those NULLs out (because NULL fails any comparison). Result: flags with zero sessions vanish from your output and you fail the 'still appear with zero' requirement. **Fix: put the window predicate inside ON.**

#### Step 2: Trap 2: COUNT(*) lies on a LEFT JOIN

On a LEFT JOIN, unmatched flags produce one synthetic row with NULL in every right-side column. `COUNT(*)` counts that synthetic row as 1, so a flag with no sessions reports `launch_window_sessions = 1`. `COUNT(us.session_id)` skips the NULL and correctly reports 0. **Always count a specific right-side column when LEFT JOIN is involved.**

---

### The solution

**LEFT JOIN with the window in ON, count the right-side key**

```sql
SELECT
    ff.flag_name,
    REPLACE(ff.flag_name, '_', ' ') AS display_name,
    ff.enabled,
    COUNT(us.session_id) AS launch_window_sessions
FROM feat_flags ff
LEFT JOIN user_sessions us
    ON DATE(us.session_start) >= DATE(ff.created)
   AND DATE(us.session_start) <  DATE(ff.created, '+30 days')
GROUP BY ff.flag_id, ff.flag_name, ff.enabled
ORDER BY ff.flag_name
```

> **Why group by flag_id when we don't select it**
>
> Notice `GROUP BY ff.flag_id` even though we don't select it. Two flags can share the same flag_name in some schemas; grouping by the id keeps them separate. Selecting only flag_name in the output is fine as long as flag_name is functionally determined by flag_id, which it is.

> **Cost shape on real data**
>
> The inequality join cannot use a B-tree index on `session_start` in the usual range-seek way, because the range bounds vary per flag. With 500 flags and 70M sessions the planner will broadcast the small flag table and stream sessions through. Real cost: O(flags x sessions in the 30-day window union). If you needed to scale this to millions of flags, you would materialize a `flag_active_window` calendar table and use it as the join driver.

> **What interviewers actually score**
>
> Two questions a senior candidate raises before writing code: (1) 'Does the 30-day window include the created day itself?' (answer: yes, hence `>=`). (2) 'What if a flag was created 5 days ago, has its window only halfway closed, should I exclude it or report partial?' The prompt is silent; flag the assumption out loud and pick one. Interviewers care more about the question than the answer.

> **BETWEEN trap on time windows**
>
> Writing `BETWEEN ff.created AND DATE(ff.created, '+30 days')` is tempting and wrong. BETWEEN is inclusive on both ends, so a session at exactly day +30 gets double-counted if you also process the next flag's window. Strict less-than on the upper bound is the safe pattern for time windows.

---

## Common follow-up questions

- Walk me through what happens if I move the date predicate from ON into WHERE. Will the output change? _(The interviewer is testing whether the candidate knows that ON-clause filters preserve LEFT JOIN semantics while WHERE filters collapse them. A common follow-up after the candidate submits a wrong query.)_
- If I swap COUNT(us.session_id) for COUNT(*), what value does a flag with zero matching sessions now report, and why? _(Tests COUNT semantics on outer joins. The candidate should explain that COUNT(*) counts synthetic NULL rows generated by LEFT JOIN.)_
- The product team now wants 'rolling 7-day' instead of '30-day fixed'. What changes? _(Tests time-bucketing literacy. Real answer: yes, recompute the windows from creation, do not assume midnight-aligned days.)_
- The query takes 4 minutes in production. The sessions table is partitioned by day. What is the first thing you would change? _(Tests partition awareness. user_sessions is partitioned on session_start; the planner can prune partitions if you push down a global date range, but per-flag windows defeat that. The fix is generating the date range as min(created) to max(created)+30 globally.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/feature_flag_engagement_impact)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.