# Subscription and Payment Data Model

> Two user types. Multiple payment methods. One messy billing table.

Canonical URL: <https://datadriven.io/problems/subscription_and_payment_data_model>

Domain: Data Modeling · Difficulty: medium · Seniority: L5

## Problem

We run an online education marketplace. Two types of users (students and instructors) can subscribe to different plan tiers. Students pay for access; instructors pay for listing and promotion features. Multiple payment methods are supported. Finance needs month-over-month growth reports and outstanding balance tracking. Design the data model, then write the SQL for MoM subscription growth.

## Worked solution and explanation

### Why this problem exists in real interviews

Subscription billing forces two normalization decisions at once: should payment methods be a separate entity from users, and should plan details be denormalized onto subscriptions or joined through a plans dimension? The signal is whether the candidate reaches third normal form for the transactional shape while keeping each payment tied to the exact subscription period it funded.

> **Trick to Solving**
>
> The tell is "users have multiple payment methods". That alone forces `payment_methods` into its own table. Before drawing any tables, a strong candidate asks: can a single payment fund multiple subscriptions, and can a single subscription have multiple payments over time? Both are yes, so `payments` sits between them at its own grain.
> 
> 1. Separate users, plans, subscriptions, methods, and payments as entities
> 2. Keep plans in their own table so pricing changes are a plan version, not a user update
> 3. Anchor payments to subscriptions so renewal histories are reconstructible
> 4. Use status columns for active versus cancelled subscriptions

---

### Break down the requirements

#### Step 1: Model plans as a reference table

`subscription_plans` is a small dimension of offered plans. When a price changes, a new plan row is created and new subscriptions point at it; existing subscriptions keep their original plan FK.

#### Step 2: Capture subscription lifecycle

`user_subscriptions` stores `started_at`, `ended_at`, and `status`. Upgrades and downgrades end the old row and start a new one, so the history is reconstructible.

#### Step 3: Keep payment methods separate

Users can store multiple cards and switch defaults. `payment_methods` with an `is_default` flag captures this without widening users.

#### Step 4: Payments are their own grain

`payments` is one row per charge attempt, with its own status and amount. This is what lets failed retries, dunning, and partial refunds live in the same model.

#### Step 5: Enforce constraints at the schema level

`user_subscriptions.status` constrained to a known set; `payments.amount > 0`; `payment_methods.is_default` constrained so a user has exactly one default at a time.

---

### The solution

Below is one conceptually sound third-normal-form design. The split between subscriptions and payments is the anchor; it keeps subscription lifecycle and cash movement as distinct grains.

> **Why This Design Works**
>
> Third normal form here costs a few joins at query time and buys clean update semantics for price changes, plan upgrades, payment retries, and refunds. Every state change is a targeted insert or update on exactly one table. The trade-off is that reporting queries need to join three or four tables, which is cheap on any modern engine.

> **Interviewers Watch For**
>
> Strong candidates distinguish plan identity from subscription instance from payment event. They also call out that `payments.subscription_id` is what lets MRR and churn analyses tie cash to the subscription that funded it. Weaker candidates flatten payments onto subscriptions and lose the ability to model retries.

> **Common Pitfall**
>
> Storing `plan_name` and `price` as columns on `user_subscriptions`. This works until a price changes, then every historical subscription needs the frozen original price and the pipeline becomes a maintenance burden.

---

### The analysis pattern

**Active MRR by plan and country**

```sql
SELECT
    u.country,
    sp.plan_name,
    COUNT(*) AS active_subs,
    SUM(sp.price) AS mrr
FROM user_subscriptions us
JOIN users u ON u.user_id = us.user_id
JOIN subscription_plans sp ON sp.plan_id = us.plan_id
WHERE us.status = 'active'
  AND us.started_at <= NOW()
  AND (us.ended_at IS NULL OR us.ended_at > NOW())
GROUP BY u.country, sp.plan_name
ORDER BY mrr DESC
```

---

### Trade-offs and alternatives

**Third normal form transactional model**

Clean update semantics, multi-method support, retries and refunds as first-class rows. Cost: joins for every reporting query and separate ETL into an analytics star for BI.

**Flattened subscription-plus-payment table**

Single-table reads, simple for a first version. Cost: price changes require historical backfills, retries duplicate rows, and the schema cannot express multiple payment methods.

---

## Common follow-up questions

- A user upgrades mid-cycle with a prorated charge. How does the model represent that? _(Tests whether the candidate ends the current subscription and opens a new one with a linking proration payment.)_
- A card fails and retries happen over three days. Where does that state live? _(Tests whether payments.status carries the retry lifecycle and whether a separate payment_attempts table is warranted.)_
- Finance wants churn defined as subscriptions that ended without a successor within 7 days. How? _(Tests self-join on user_subscriptions and the churn window definition.)_
- A plan price changes from $10 to $12. What rows move and what stays still? _(Tests that new subscriptions point at a new plan row while existing subscriptions keep their original plan FK.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/subscription_and_payment_data_model)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.