# Online Marketplace  -  Seller Payouts

> The buyer paid one number. The seller got a different one.

Canonical URL: <https://datadriven.io/problems/online_marketplace_seller_payouts>

Domain: Data Modeling · Difficulty: hard · Seniority: L6

## Problem

We run an online marketplace where third-party sellers list products and buyers purchase them. The platform collects the full payment from the buyer, deducts a commission, and pays the seller. Refunds are common and complicate the accounting. Finance needs a model that reconciles every dollar from collection to payout. Design the data model.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge probes whether a candidate separates **what was sold** from **what money moved**. Interviewers want to see two distinct grains coexist cleanly: an order-line fact for product economics and an append-only ledger for cash movements. Conflating them is the single most common failure mode in marketplace finance models because refunds, partial payouts, and tax all force the single-table design into ugly UPDATE paths.

> **Trick to Solving**
>
> The signal is the phrase "reconciles every dollar." Reconciliation demands immutability: every money movement must be a new row, never an update. Before drawing any tables, a strong candidate asks: do I need to preserve the original order amount after a refund, and who is the owning party on each cash movement?
> 
> 1. Spot the reconciliation language in the prompt
> 2. Commit to an append-only ledger grain alongside the order fact
> 3. Split tax into its own ledger entry type
> 4. Use `payout_order_lines` as the bridge so batch payouts can attribute cash back to source orders

---

### Break down the requirements

#### Step 1: Declare the grain twice

`fact_orders` is at the order line grain: one row per product per seller per order. `fact_ledger_entries` is at the money-movement grain: one row per charge, commission, fee, tax, refund, or payout. Two facts, two grains, linked by `order_line_id`.

#### Step 2: Model refunds as new ledger rows

A refund is not a mutation of the original charge. It is a new row with a negative amount and an entry type like `REFUND_BUYER`. The original `fact_orders` row stays untouched so historical revenue snapshots remain reproducible.

#### Step 3: Separate tax from revenue

Tax collected is a pass-through liability, not platform revenue. Storing it in its own ledger entry type (or as a distinct column on the order line) keeps revenue reporting clean and makes remittance audits straightforward.

#### Step 4: Bridge payouts with a junction table

One payout covers many order lines. `fact_payouts` lives at the batch grain; `payout_order_lines` is the many-to-many bridge that lets finance answer "which orders funded this payout?" without scanning the entire ledger.

#### Step 5: Anchor every fact to conformed dimensions

`dim_buyers`, `dim_sellers`, `dim_products`, and `dim_date` are shared across all three facts. Conformed dimensions are what make cross-fact reporting (e.g. GMV versus payouts by seller) tractable.

---

### The solution

Below is one defensible model. The dual-fact split is the anchor: order economics and cash movements live in separate tables with different grains, joined through an immutable ledger.

> **Why This Design Works**
>
> The ledger is the system of record for cash. Because it is append-only, any historical finance report can be reproduced by filtering on `posted_at <= as_of_time`. `fact_orders` becomes a stable analytics fact for GMV, while the ledger handles the accounting identity. The trade-off is duplication: total seller proceeds can be derived from the ledger, but the order-line fact keeps it one hop away for BI speed.

> **Interviewers Watch For**
>
> Strong candidates articulate the accounting identity (charges + fees + refunds + payouts should sum to zero per seller). Weak candidates build a single wide fact and propose UPDATEs on refund. The signal here is whether the candidate reaches for immutability before being prompted.

> **Common Pitfall**
>
> Collapsing tax into a `revenue` column. Tax is a liability, not revenue. Rolling it in makes every GMV and commission calculation silently wrong and creates a tax remittance bug that only surfaces at audit.

---

### The analysis pattern

**Seller reconciliation by payout batch**

```sql
SELECT
    p.payout_id,
    s.legal_name,
    p.net_amount,
    SUM(l.amount) FILTER (WHERE l.entry_type = 'CHARGE') AS gross_charges,
    SUM(l.amount) FILTER (WHERE l.entry_type = 'COMMISSION') AS commission,
    SUM(l.amount) FILTER (WHERE l.entry_type IN ('REFUND_BUYER','REFUND_SELLER_DEBIT')) AS refunds
FROM fact_payouts p
JOIN dim_sellers s ON s.seller_id = p.seller_id
JOIN payout_order_lines pol ON pol.payout_id = p.payout_id
JOIN fact_ledger_entries l ON l.order_line_id = pol.order_line_id
WHERE p.payout_date = '2025-03-15'
GROUP BY p.payout_id, s.legal_name, p.net_amount
```

---

### Trade-offs and alternatives

**Dual fact (orders + ledger)**

Clean grains, append-only cash history, reproducible finance reports. Cost: writes go to two places and cross-fact reconciliation requires a join through `payout_order_lines`.

**Single wide fact_orders with refund columns**

Faster reads for simple GMV queries, fewer tables. Cost: refunds and payout batches force destructive UPDATEs, historical snapshots drift, and multi-item payouts become impossible to express cleanly.

---

## Common follow-up questions

- How would you prove the accounting identity (charges + fees + refunds + payouts = 0 per seller) holds in production? _(Tests whether the candidate treats the ledger as a reconciliation target, not a log.)_
- A tax jurisdiction changes rates mid-quarter and 0.5% of orders need retroactive tax adjustment. How do the ledger and fact_orders respond? _(Tests append-only discipline: adjustment is a new `TAX_CORRECTION` entry type, not an UPDATE.)_
- Finance wants a daily payout forecast by seller. How would you model accrued-but-unpaid balances? _(Tests whether the candidate can derive a balance view from the ledger without adding state columns.)_
- How would you partition `fact_ledger_entries` at 50M rows per day with a 7-year retention requirement? _(Tests partitioning strategy on posted_at with tiered storage for cold partitions.)_
- Sellers dispute payouts and demand an as-of view of their balance on any historical date. How does the model support that? _(Tests temporal reconstruction using the immutable ledger and a single predicate on posted_at.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/online_marketplace_seller_payouts)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.