# The Churner Who Came Back

> They cancelled. They came back. The report has to tell both stories correctly.

Canonical URL: <https://datadriven.io/problems/the_churner_who_came_back>

Domain: Data Modeling · Difficulty: hard · Seniority: L5

## Problem

We have a global subscription business with hundreds of millions of subscribers across multiple plan tiers and regions. Subscribers can upgrade, downgrade, pause, cancel, and re-subscribe. Finance and product analytics need a data model that supports churn analysis, revenue reporting, and plan mix reporting. Design the data model.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether you can model a recurring-revenue lifecycle where the same real-world entity (a subscriber) can have multiple contiguous spans of activity. The signal here is grain clarity on `fact_subscriptions` and the decision to track price changes on `dim_plans` as Type 2 SCD instead of mutating the dimension in place.

> **Trick to Solving**
>
> Before drawing any tables, a strong candidate asks: "is a re-subscription the same subscription or a new one, and how does finance want churn attributed across the gap?" That single question forces the grain of `fact_subscriptions` to land on one row per contiguous subscription period, which makes churn, reactivation, and MRR roll-forward all expressible without window-function gymnastics.
> 
> 1. Clarify whether resubscription resets tenure
> 2. Declare `fact_subscriptions` grain = one row per subscription period
> 3. Decide SCD strategy on `dim_plans` (Type 2 for price history)
> 4. Split billing events onto their own fact for cash vs accrual reconciliation

---

### Break down the requirements

#### Step 1: Declare the grain on fact_subscriptions

One defensible grain is one row per contiguous subscription period per subscriber. A cancel-then-resubscribe produces a new row, not an update. This keeps tenure, churn, and win-back analysis additive.

#### Step 2: Model plan price history on dim_plans as Type 2

`monthly_price` moves. A strong candidate anchors historical MRR to the `plan_sk` in effect at the period start, using `effective_from` and `is_current` to reconstruct any past price.

#### Step 3: Separate billing events from subscription periods

`fact_billing_events` captures charges, refunds, dunning, and proration at the event grain. Keeping this separate from `fact_subscriptions` avoids conflating accrual revenue (period) with cash movement (event).

#### Step 4: Give dim_subscribers a GDPR-safe surface

PII sits only on `dim_subscribers`, not on the fact. Erasure requests can null the current row without touching financial facts that must be retained for audit.

#### Step 5: Use dim_date as a conformed dimension

`dim_date` lets finance join either fact to fiscal calendars, month-ends, and business-day logic without reimplementing calendar rules in SQL.

---

### The solution

Below is one conceptually sound model. The grain on `fact_subscriptions` is the anchor: one row per subscription period. Everything else, including SCD choice and the split between accrual and cash facts, follows from that.

> **Why this design holds up**
>
> The period-grain fact keeps MRR roll-forward additive and makes reactivation a first-class concept. The Type 2 `dim_plans` trades a little join complexity for the ability to reprice history when finance restates. That trade-off is the one interviewers expect you to name out loud.

> **What strong candidates do**
>
> They state grain before drawing a single box. They explicitly choose Type 2 on `dim_plans` and justify it with a finance restatement scenario. They separate accrual from cash by putting charges on `fact_billing_events`, and they flag GDPR erasure as a dimension-only operation.

> **Red flags to avoid**
>
> Embedding price on the fact table freezes history in a way that blocks restatement. Mutating `fact_subscriptions` on resubscription erases the gap that powers win-back analysis. Putting billing events on the same fact forces nulls on accrual rows and inflates revenue whenever a payment retries.

---

### The analysis pattern

**Monthly active subs and MRR by tier**

```sql
SELECT
    d.fiscal_period,
    p.tier,
    COUNT(DISTINCT s.subscriber_sk) AS active_subs,
    SUM(s.mrr_usd) AS mrr_usd,
    SUM(CASE WHEN s.end_reason = 'voluntary_churn' THEN 1 ELSE 0 END) AS churned
FROM fact_subscriptions s
JOIN dim_plans p ON p.plan_sk = s.plan_sk
JOIN dim_date d ON d.date_key = CAST(TO_CHAR(s.period_start_ts, 'YYYYMMDD') AS INT)
WHERE s.period_start_ts <= d.calendar_date
  AND (s.period_end_ts IS NULL OR s.period_end_ts > d.calendar_date)
GROUP BY d.fiscal_period, p.tier
```

---

### Trade-offs and alternatives

**Period-grain fact with Type 2 plans**

Reactivations are first-class rows. Historical MRR reprices cleanly via `plan_sk`. Joins are wider and require as-of logic when tying events to the right period. Storage grows with every price change.

**Event-grain fact with Type 1 plans**

Every lifecycle change (start, pause, plan change, cancel) is its own row. Simpler writes and idempotent ingestion. Reporting has to reconstruct periods with window functions on every query, and price history is lost unless captured elsewhere.

---

## Common follow-up questions

- How would you extend `fact_subscriptions` to report MRR in USD while honoring the billed currency on `fact_billing_events`? _(Tests whether the candidate separates accrual currency policy from cash reporting and introduces a rates dimension.)_
- A subscriber requests GDPR erasure mid-period. Which tables change and which stay? _(Tests the PII-only-on-dimension pattern and the legal retention of financial facts.)_
- Finance restates last quarter's tier prices. How does the schema reprice history without rewriting `fact_subscriptions`? _(Tests Type 2 SCD on `dim_plans` and the stability of `plan_sk` on historical fact rows.)_
- At 500M subscribers, how would you partition `fact_subscriptions` and `fact_billing_events` to keep churn queries bounded? _(Tests partitioning choices (period_start_ts vs event_ts) and their impact on retention analysis.)_
- Product wants to see pause-resume cycles distinctly from cancel-resubscribe. Does the grain need to change? _(Tests schema evolution and whether the `end_reason` column is sufficient.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_churner_who_came_back)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.