# Every Deal Is a Financial Transaction

> Real money on the table. Reconstruct every hand.

Canonical URL: <https://datadriven.io/problems/every_deal_is_a_financial_transaction>

Domain: Pipeline Design · Difficulty: hard · Seniority: L7

## Problem

Our platform hosts millions of real-money rummy and poker games daily, and every card deal, bet, and fold is a regulated financial transaction under India's gaming laws. Our fraud team needs to analyze complete game sessions, but right now we only have raw event streams with no session context. Design a pipeline that reconstructs full game sessions and feeds our fraud and analytics systems.

## Worked solution and explanation

### Why this problem exists in real interviews

An L7 stateful-streaming question with regulatory weight: chips clear within seconds, sessions assemble from many micro-events that may end without a final event, data has to partition by player state because state-level law differs, and regulators expect any game's complete record on demand. The trap is treating session reconstruction as a batch problem; chips have already cleared by the time the batch finishes.

The default reach is to log every event and reconstruct sessions in a nightly batch for fraud and analytics. Chips clear in seconds; the batch's flag arrives long after the money moved. Disconnected players' sessions never close because the reconstructor is waiting for a final event that didn't come. Player-state data pools globally because the team didn't draw the partition boundary at ingestion. A regulator asks for a specific game's record and the team finds it in a flat archive that takes hours to search.

> **Trick to Solving**
>
> Stream sessions on a per-game state machine, close on inactivity timer, partition by player state at ingestion, retain immutable per-game records.
> 
> 1. A streaming processor keyed by game_id maintains the event sequence per game and emits a closed session when the final event arrives or an inactivity timer fires.
> 2. Player-state partitioning starts at ingestion: events route into per-state topics (or a single topic with state in the key) and per-state archives, so retention rules apply per state.
> 3. The complete game record (every event in order) writes to an immutable per-game record store retrievable by game_id.
> 4. Fraud reads closed sessions from the streaming output within tens of seconds; chips clear after fraud has had its window.

---

### Walk the requirements

#### Step 1: Fraud flags within tens of seconds, before chips clear

A streaming processor keyed by game_id maintains state per game: it consumes the game's micro-events in order, holds the sequence, and emits the closed session when the final event arrives. Fraud subscribes to the closed-session output and scores within tens of seconds. Chips clearing is gated on the fraud signal arriving in time. A 'reconstruct sessions in batch' design is the version where chips have already moved by the time fraud has anything to look at; the sub-second budget is what makes the regulatory contract feasible.

#### Step 2: Per-state partitioning at ingestion, retention rules per state

Indian gaming law treats each state's residents differently. Events route into per-state topics (or per-state partitions on a shared bus) at ingestion and land in per-state archive locations. Retention rules, access controls, and compliance audits run per state. Pooling globally and filtering at query time is the version where one state's law spillover affects another state's records; partitioning at the boundary is the contract that lets per-state rules apply.

#### Step 3: Sessions close on a timer, even when the player disconnects

A player who disconnects without a final event still produces a session that fraud and analytics need. The streaming processor's per-game state holds an inactivity timer; if no event arrives within the timer's window, the processor closes the session with a 'disconnected' flag and emits it downstream. Without the timer, those sessions never close and the per-game state grows unbounded; with it, every game eventually produces a record.

#### Step 4: Immutable per-game record retrievable on demand

Regulators may request the complete record of any specific game. Each game's full event sequence writes to an immutable per-game store in cold storage, keyed by game_id. A regulator's request is a lookup by game_id within the regulator's window, returning the full event sequence in order. Storing in a flat archive that takes hours to search is the version where missed records turn into compliance fines; per-game keying with immutability is what makes the retrieval bounded.

---

### The shape that fits

> **What this design gives up**
>
> Per-game state in the streaming processor costs memory proportional to active games; per-state partitioning multiplies the topic and archive shape; inactivity timers mean some sessions close on the timer rather than the final event (with that flag visible to consumers); immutable per-game records grow forever (or for the retention window). Implementation cost is the price; the win is fraud flagging before chips clear, per-state law applied at the boundary, every disconnection producing a record, and regulator retrieval that's a query.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - A streaming processor keyed by game_id reconstructs sessions and emits closed sessions within tens of seconds of the game ending.
> - Player-state partitioning is enforced at ingestion; data is stored and retained per state.
> - Disconnected players' sessions close on an inactivity timer rather than waiting indefinitely.
> - Every game's complete event record is retained immutably and retrievable per game within the regulator's window.

> **The mistake that ships**
>
> What gets shipped logs raw events and reconstructs sessions in a nightly batch. Chips clear before fraud has a session to look at; a state regulator finds out about misallocated chips before the platform does. Disconnected sessions never close because the reconstructor waits for a final event; the reconstruction state grows forever and operators kill it. State-level data pools globally and one state's records spill into another's. A regulator requests a game's record and the team takes hours grepping a flat archive. The eventual rebuild moves session reconstruction to streaming with a timer, partitions by state at ingestion, and writes per-game immutable records.

---

## Common follow-up questions

- An event arrives an hour late after the inactivity timer has already closed the session. What does this design do, and what does fraud see? _(Tests whether the candidate sees that the closed session is committed and the late event lands in the per-game archive (the immutable record), with a separate compaction or reissued session emitted downstream if the late event materially changes the session. Fraud's original signal stands; analytics may see a corrected session in a later batch.)_
- A new state introduces a stricter retention rule than the others. What in this design changes, and where? _(Tests whether the candidate sees the per-state archive as the boundary for retention rules: the new state's archive applies the longer retention; other states' archives are unaffected. The session reconstructor and the regulator lookup don't change; the rule lives where the data lives.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/every_deal_is_a_financial_transaction)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.