# Score It Before It Clears

> The fraudsters move fast. Your pipeline has to move faster.

Canonical URL: <https://datadriven.io/problems/score_it_before_it_clears>

Domain: Pipeline Design · Difficulty: hard · Seniority: L5

## Problem

Our payments platform processes card transactions globally. We need to score every transaction for fraud before it clears. Design a scalable data pipeline for real-time fraud detection.

## Worked solution and explanation

### Why this problem exists in real interviews

Real-time fraud scoring with three properties that have to fit together: scoring before clearance, a sane fallback when scoring is down, and a feedback loop that learns from confirmed chargebacks. The trap is wiring scoring synchronously into the payment gateway and treating chargebacks as 'we'll add that later'; both shortcuts compound until scoring outages take payments down or the model's accuracy plateaus.

The default reach is a synchronous call from the gateway into the scoring service for every transaction. The first time the scoring service is slow, the gateway hangs and payment latency budgets break; the on-call engineer adds a timeout that approves on timeout, then approves-on-error, and the policy decision happens implicitly inside ad-hoc code. Confirmed chargebacks land weeks later in a separate system the model never sees; retraining uses prior predictions rather than real outcomes and the model stops improving.

> **Trick to Solving**
>
> Decouple scoring through a queue, fall back to approve-and-review on outage by policy, route chargebacks back to training as real outcomes.
> 
> 1. A queue between the gateway and the scoring service decouples them. The gateway publishes the transaction with a deadline; the scorer reads, scores, and writes a decision back; if no decision arrives within the deadline, the gateway proceeds on the fallback policy.
> 2. Approve-and-review is the documented fallback: when scoring is unavailable or stale, transactions approve and route to a review queue. The policy is in the design, not in a hotfix.
> 3. Chargebacks (confirmed weeks later) feed a labels store the next training run reads alongside predictions, so the model learns from real fraud outcomes rather than only from its prior decisions.

---

### Walk the requirements

#### Step 1: Score before clearance, on a path decoupled from the gateway

The gateway publishes each transaction with its scoring deadline onto a queue; the scoring service reads, scores, and writes the decision back. The gateway waits for the decision up to the deadline and proceeds. End-to-end stays inside the authorization budget. A synchronous call into the scoring service is the version where a slow score blocks payments; the queue is what gives the system a deadline-bounded decoupling.

#### Step 2: Approve-and-review when scoring is down, by policy not hotfix

When scoring is unavailable or the deadline expires, the transaction approves and routes to a review queue. The business has chosen this policy: blocking all payments during a scoring outage is worse than approving with manual review. The fallback path is part of the design, with the review queue and human SLA defined. Letting the gateway implement the fallback ad-hoc through timeouts is the version where the policy emerges from on-call decisions rather than from a documented choice.

#### Step 3: Chargebacks loop into training as real outcomes

Chargebacks confirm whether transactions were actually fraud weeks after the score. A labels store records each transaction's score, the actual outcome (chargeback or not), and the time gap. Retraining reads predictions joined to labels and learns from real outcomes, not from prior predictions. Without the loop, the model retrains on its own decisions and accuracy plateaus; with the loop, the model gets better as chargebacks accumulate.

---

### The shape that fits

> **What this design gives up**
>
> A queue between gateway and scoring is more infrastructure than a synchronous call; an explicit fallback path adds a review queue and a human workflow; the chargeback labels store grows for years and joins back to historical predictions. Implementation cost is the price; the win is scoring that doesn't block payments under load, an outage policy the business signed for rather than the on-call engineer's instinct, and a model that learns from real fraud rather than its own past.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - A queue decouples the payment gateway from the scoring path; the gateway proceeds on the fallback policy if the score doesn't return in time.
> - An approve-and-review path catches transactions when scoring is unavailable, routing them to a review queue rather than blocking payments.
> - Confirmed chargeback labels feed back into the model's training data alongside the scoring decisions, so retraining sees real outcomes.

> **The mistake that ships**
>
> What gets shipped wires the gateway synchronously into the scoring service. The first time scoring is slow, payments hang and the on-call engineer adds an approve-on-timeout. The policy emerges from hotfixes; nobody documents what happens during an outage, and a year later 'why did we approve those during the incident' has no clean answer. Chargebacks land in a separate system the model never reads; the model retrains on prior predictions and stops improving. The eventual rebuild adds the queue, the documented fallback path, and the chargeback feedback loop.

---

## Common follow-up questions

- A scoring deadline expires for a transaction the gateway has already heard the result of seconds later. What does this design do, and what does the model see for that transaction? _(Tests whether the candidate sees the deadline as the gateway's contract: once the deadline expires, the gateway has acted on the fallback policy and the late score doesn't reverse that. The scoring decision still writes to the labels store with the timing recorded, so retraining sees both the decision and the gateway's actual action.)_
- Chargebacks for a window are reversed (the customer accepts the charge after dispute). How does the labels store reflect that, and how does retraining use it? _(Tests whether the candidate sees labels as a stream of outcomes per transaction with the most recent state authoritative: the reversed chargeback updates the label and retraining reads the resolved label. Without label-update semantics, the model trains on stale fraud labels and learns the wrong lesson.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/score_it_before_it_clears)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.