# Three Regions, One Finance Team

> Payments from everywhere. One consistent report.

Canonical URL: <https://datadriven.io/problems/three_regions_one_finance_team>

Domain: Pipeline Design · Difficulty: hard · Seniority: L6

## Problem

We process payments across the US, EU, and APAC and each region runs its own infrastructure. The finance team needs a single consolidated view of global payment volume for reporting, but right now every region is a silo. Design a data platform that ingests payment events from all regions and makes them available for consistent global reporting.

## Worked solution and explanation

### Why this problem exists in real interviews

Four constraints on one pipeline: a 6am UTC consolidated view, EU data residency, point-in-time FX, and per-region completion visibility for finance. The trap is centralizing first and dealing with residency at query time; that's how raw EU rows end up in the central warehouse and finance sees one consolidated number that hides which region is missing.

The natural reach is one DAG that pulls all three regions into a central warehouse, joins to today's FX, and writes a consolidated total. EU raw data crosses the border on day one; finance shows up at 6am UTC and sees a number that doesn't say which regions are complete. Historical revaluation drifts every time the FX changes because the FX is applied at consolidation time. APAC's feed lands late one Tuesday and the consolidated view delays the whole report.

> **Trick to Solving**
>
> Per-region ingest in-region, residency-respecting aggregation that crosses only as numbers, FX locked at transaction-hour, per-region status visible to finance.
> 
> 1. Each region runs its own ingestion in-region; dedup and per-region consolidation happen locally before anything crosses.
> 2. Only aggregated, residency-cleared values move to the central warehouse. EU raw rows never leave EU infrastructure.
> 3. Each transaction stores both original-currency and FX-converted amount using the rate at the transaction's hour, so historical revaluation doesn't drift.
> 4. An orchestrator publishes per-region completion status: which region has finalized, which is late, which is missing, ahead of the 6am UTC consolidated read.

---

### Walk the requirements

#### Step 1: Land finance's consolidated view before 6am UTC, with per-region alerting

An orchestrator schedules per-region ingest, per-region consolidation, and the central rollup, with sensors and alerts before 6am UTC if any region is at risk. The deadline is the contract; per-region sensors are how on-call sees by name which region is delayed and acts. Without an orchestration layer there's nothing watching the deadline; without a central warehouse the consolidated view has nowhere to live.

#### Step 2: EU raw data stays in EU; only aggregates cross

EU regulation forbids raw payment data crossing borders. The EU pipeline runs in EU infrastructure: events ingest into an EU queue, dedup and consolidate in EU compute, and only aggregated values (totals, counts, FX-converted sums per period and merchant) cross to the central warehouse. The central warehouse never sees raw EU rows. A 'central warehouse with a residency filter on read' design is the version that already failed the audit on day one because the raw rows are sitting there to be filtered.

#### Step 3: FX locked at the transaction's hour, stored on the row

Each transaction stores its original amount and the FX rate for the transaction's hour, plus the converted amount. Historical revaluation reads the row's stored conversion rather than re-applying today's FX. A 'compute USD on the fly using current FX' approach silently rewrites every historical total every time the rate moves; locking the rate on the row is the version where last quarter's reports stay stable across rebuilds.

#### Step 4: Per-region completion status visible to finance

Finance has to see which regional totals are still incomplete before signing off. The orchestrator publishes per-region status to the consolidated view: complete / late / missing, with the per-region timestamp and the count expected versus received. Finance reads the status alongside the totals; signing off on a number with a 'missing region' flag is a different decision than signing off on a complete number. Presenting one consolidated number with no per-region surface is the failure the requirement names.

---

### The shape that fits

> **What this design gives up**
>
> Per-region in-region ingest means three regional pipelines instead of one; FX-on-the-row roughly doubles the financial column width; per-region status surface is metadata that has to be maintained alongside the totals. Operational cost is the price; the win is residency that survives an audit, historical totals that don't drift, finance sign-offs that know which region's complete, and a 6am UTC consolidated view that lands when expected.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - Each region has its own ingestion (a regional queue / log) with dedup happening in-region.
> - Only aggregated values cross from EU into the central warehouse; raw EU PII stays in-region.
> - Each transaction carries both original-currency and the FX-converted amount using the rate at the transaction's hour.
> - An orchestration layer surfaces per-region completion status to finance ahead of the 6am UTC deadline.

> **The mistake that ships**
>
> What gets shipped pulls all three regions into a central warehouse, joins to today's FX at consolidation, and shows finance one consolidated number. EU raw payment data crosses the border on day one; the privacy office finds out before the regulator does. Historical revaluation drifts every quarter as FX moves. APAC's feed is late one Tuesday and finance signs off on a number that's missing APAC because the dashboard hides the gap. The eventual rebuild moves consolidation in-region, locks FX on the row, and adds a per-region status surface; each was a property the original cut treated as a query-time filter.

---

## Common follow-up questions

- EU adds a stricter residency rule that even aggregated values can't include certain breakdowns. What in this design changes, and where? _(Tests whether the candidate sees the EU consolidator as the residency boundary: aggregations that violate the new rule simply don't compute in EU and don't appear at the central warehouse. The central pipeline doesn't change; the EU consolidator's aggregation logic does.)_
- Finance asks why last quarter's USD total changed slightly between two reports run a month apart. What in this design tells them the answer? _(Tests whether the candidate sees that the FX-on-the-row design protects against the drift, but if the row stores the rate-at-hour and the rebuild used a corrected historical rate, the answer is in the FX history table the row references. The lineage is the row plus the FX table; finance can see exactly which rate moved.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/three_regions_one_finance_team)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.