# Who Is Churning and Why

> Subscribers churn. The pipeline cannot.

Canonical URL: <https://datadriven.io/problems/who_is_churning_and_why>

Domain: Pipeline Design · Difficulty: medium · Seniority: L6

## Problem

We run a SaaS subscription platform with monthly and annual plans across multiple product tiers. The finance and growth teams need a unified analytics layer that can answer: which cohorts churn fastest, what is monthly recurring revenue by tier, and which accounts are at risk of non-renewal. Right now subscription data lives in the application database and no one can query it for analytics without hitting production. Design the pipeline and data model.

## Worked solution and explanation

### Why this problem exists in real interviews

Subscription analytics where the data model has to model lifecycle events correctly: MRR has to match the billing provider, plan changes show up intraday, immediate vs end-of-period cancellations are different, and reactivations keep the original cohort. The trap is treating the subscription table as latest-state; cohort and MRR analyses need the lifecycle history.

The default reach is a nightly snapshot of the subscription table into the warehouse. Finance reads MRR off the snapshot and reconciles to the billing provider , the numbers diverge because the snapshot misses mid-day plan changes and the cancellation semantics aren't modelled. Reactivated accounts get a fresh cohort date because the snapshot only knows the latest customer-since field.

> **Trick to Solving**
>
> CDC captures lifecycle events as they happen, the MRR fact models cancellation kinds, and the cohort table preserves the original signup.
> 
> 1. CDC from the application database emits every lifecycle event (signup, plan change, cancellation, reactivation) into the warehouse as a timestamped row.
> 2. The MRR fact derives from the lifecycle events with an explicit cancellation_kind column (immediate vs end-of-period); MRR drops on the right day for each.
> 3. A cohort table preserves the original signup date per account; reactivations write a new lifecycle event but the cohort date doesn't change.
> 4. A reconciliation step compares warehouse MRR to the billing provider's reported MRR daily; drift past tolerance gates the publish.

---

### Walk the requirements

#### Step 1: Finance MRR matches the billing provider

CDC emits subscription events into the warehouse; the MRR fact is computed from the events and reconciles to the billing provider's reported MRR daily. If the diff exceeds tolerance, the publish gates and the team investigates. Without CDC the events have to come from polling the live database, which the operations team won't sign off on; without a warehouse tier the MRR model has nowhere to live.

#### Step 2: Plan changes show up intraday

The CDC stream lands plan-change events the moment they happen; the MRR fact's hourly refresh picks them up. A mid-cycle upgrade changes MRR on the day it happens, not at the next month's batch. Snapshot-only models are the version where intraday changes wait for the next-month rebuild; CDC-driven events are what makes the dashboard reflect the change in the next refresh.

#### Step 3: Cancellation kind on the row determines when MRR drops

Each cancellation event carries a kind: immediate or end-of-period. The MRR fact applies the drop accordingly: immediate cancellations drop MRR on the cancellation day; end-of-period cancellations continue MRR through the end of the paid period and drop on the period-end day. A 'one cancellation rule' approach is the version where MRR is wrong for one kind or the other; modelling the kind on the event is the contract that fits both.

#### Step 4: Reactivations keep the original cohort

When a churned account reactivates, a new lifecycle event lands in the warehouse, but the cohort table preserves the original signup date for that account. Cohort retention curves continue to treat the account as the original cohort, not as a new acquisition. Without the cohort preservation, reactivations inflate new-acquisition metrics and undercount retention; the cohort key is the contract that distinguishes 'came back' from 'new'.

---

### The shape that fits

> **What this design gives up**
>
> CDC adds connector infrastructure and a replication slot; the MRR fact's cancellation-kind handling is logic that has to be tested against both kinds; reconciliation halts the publish on partner mismatches; the cohort preservation requires a separate cohort table that updates only on first signup. Implementation cost is the price; the win is finance MRR matching the billing provider, intraday plan-change visibility, accurate cancellations, and cohort curves that don't double-count reactivations.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - CDC captures lifecycle events from the application database without polling production.
> - The MRR fact distinguishes immediate from end-of-period cancellations and applies the MRR drop on the right day for each.
> - The cohort table preserves the original signup date on reactivation.
> - A reconciliation step compares warehouse MRR to the billing provider's MRR and surfaces drift.

> **The mistake that ships**
>
> What gets shipped does a nightly snapshot of the subscription table into the warehouse. Finance reads MRR and reconciles to the billing provider; the numbers diverge because the snapshot misses intraday plan changes and the cancellation kinds aren't modelled. Reactivated accounts show up as new acquisitions in the cohort table. The eventual rebuild adds CDC, the lifecycle-event-driven MRR fact with cancellation-kind handling, the cohort preservation, and the reconciliation gate.

---

## Common follow-up questions

- A customer cancels end-of-period and then reactivates the day after the period ends. What does the design do to MRR and the cohort, and what should each show? _(Tests whether the candidate sees the lifecycle as: end-of-period cancellation drops MRR on the period-end day; the reactivation a day later writes a new lifecycle event with the original cohort preserved on the cohort table. MRR resumes on the reactivation day; cohort retention shows the account as the original cohort, returned.)_
- Reconciliation flags a divergence with the billing provider for one tier only. What does the design surface, and where would you investigate? _(Tests whether the candidate sees per-tier diff in the reconciliation output: the divergence by tier points the investigation at that tier's plan-change events, cancellation events, or pricing. The publish stays held for that tier (or globally, depending on policy) until the diff is resolved or accepted.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/who_is_churning_and_why)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.