# Ship Before Fraud Finishes Checking

> The claim looks clean. The fraud model disagrees.

Canonical URL: <https://datadriven.io/problems/ship_before_fraud_finishes_checking>

Domain: Pipeline Design · Difficulty: hard · Seniority: L6

## Problem

We insure over 300 million devices and process tens of thousands of insurance claims per day. When a customer files a claim for a lost or broken phone, we ship a replacement within hours - but that means our fraud detection must complete before the device ships, not after. Our current warehouse is a nightly batch job and the fraud team is working from yesterday's data. Design a pipeline that supports real-time fraud scoring at claim submission time.

## Worked solution and explanation

### Why this problem exists in real interviews

Fraud scoring has to happen during the claim, not the night after, against a feature set that includes device history going back years, without touching the operational database, and with a sane fallback for the day fraud's own infrastructure is broken. The trap is one of two: 'just stream it' (which leaves the historical features as a query against the operational DB) or 'pre-compute everything' (which silently drops claims when scoring is down).

The whiteboard answer is a streaming pipeline that reads the claim, queries the operational database for the device's history during scoring, and returns a decision. Two requests per claim hit production; tens of thousands of claims a day means tens of thousands of extra reads against the live DB, and the operational team has the same conversation about query load they always have. Meanwhile the day fraud scoring fails, claims either auto-approve (now ineligible claims are shipping) or queue forever (now legitimate customers wait days for a replacement).

> **Trick to Solving**
>
> Stream the claim, score against a pre-computed online lookup tier, fail open to manual review, never query the live database during scoring.
> 
> 1. A online lookup tier, refreshed off CDC from the operational DB, holds device history. The scoring path reads the online lookup tier, not the live DB.
> 2. Device history is a slowly-changing dimension keyed on (device_id, valid_from, valid_to) with every owner, plan, and repair event preserved. An investigator can reconstruct the chain at any past date.
> 3. Manual-review fallback is part of the design, not a runbook. When the scorer is unavailable or the online lookup tier is stale beyond a threshold, claims route to a queue that humans work, with the SLA the business agreed to.

---

### Walk the requirements

#### Step 1: Score during the claim, on a streaming path

The claim arrives, a stream processor enriches it with features from the online lookup tier, the scorer returns a decision, and shipping is gated on that decision. End-to-end is on the order of seconds, comfortably inside 'before the replacement ships.' Without a streaming / sub-minute path the scoring is too late by definition; the named problem is 'fraud is reading yesterday's warehouse.'

#### Step 2: Keep every owner, plan, and repair event with effective dates

Fraud needs the chain. Model device history as a slowly-changing dimension in the warehouse keyed on (device_id, valid_from, valid_to), with one row per state change: ownership transfer, plan switch, repair event. An investigator's query is 'who owned this device on this date,' answered by an as-of join, not by reading the latest row. The streaming online lookup tier mirrors the recent slice; the warehouse holds the full chain for years. Without a warehouse tier the chain has nowhere to live.

#### Step 3: Read features from the store, not from the live database

Hundreds of millions of devices can't be queried live during scoring. The operational database emits CDC into the same stream the claims ride; a feature builder maintains the device-keyed feature rows in a low-latency store. Scoring reads the online lookup tier; the operational DB sees zero traffic from the scoring path. A 'we'll just hit the live DB during scoring' design is the version operations rejects on day one.

#### Step 4: Manual review when scoring is down, by design

If the scorer is down, the online lookup tier is stale, or the streaming path is broken, claims route to a manual review queue that humans work to a defined SLA. That's the business's agreed fallback. The version that auto-approves on outage ships ineligible claims; the version that queues indefinitely strands customers. Both are visible in production and both are worse than a manual-review path the business has already accepted.

---

### The shape that fits

> **What this design gives up**
>
> A online lookup tier keyed on device, refreshed off CDC, is more pieces than 'query the live DB.' A manual-review fallback means standing up a queue and a triage console with their own SLA. The full slowly-changing dimension chain in the warehouse costs more than 'latest state per device.' Complexity is the cost; what arrives is scoring that returns in time, fraud features that don't pressure operations, and a degraded mode the business has already signed off on.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - A streaming path scores claims in seconds against features pre-computed in a low-latency store.
> - The operational database is never queried during scoring; features come from a CDC-fed online lookup tier.
> - A manual-review queue catches claims when scoring or feature freshness is degraded.
> - Device history is retained per-device with effective dates so an investigator can reconstruct any past state.

> **The mistake that ships**
>
> The shape that ships does streaming fraud scoring that queries the operational database for device history during scoring. Operations notices the load within a week and asks the data team to back off. The team adds a cache, which goes stale and starts returning wrong scores. A scorer outage causes the streaming job to time out; the on-call engineer adds an auto-approve fallback to keep claims moving, and ineligible claims start shipping. The eventual rebuild is an online lookup tier fed by CDC and an explicit manual-review fallback. The retrofit goes faster than the original misstep cost in load-back-off conversations and on-call hours.

---

## Common follow-up questions

- An investigator asks 'who owned this device when claim X was filed.' Which store do they query, and what makes the answer correct? _(Tests whether the candidate sees the warehouse's slowly-changing dimension as the source for historical reconstruction, with the as-of join giving the right owner for any past date. The online lookup tier has the recent slice but isn't authoritative for historical queries.)_
- The CDC stream from the operational DB falls behind by an hour. What does that change about the online lookup tier, scoring, and the manual-review fallback? _(Tests whether the candidate has a freshness threshold for the online lookup tier: when CDC lag crosses a budget, scoring routes claims to manual review rather than scoring against stale features. The manual-review SLA is the safety net for both scoring outage and feature staleness.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/ship_before_fraud_finishes_checking)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.