# Financial Trading Warehouse

> Every trade, every tick, every fraction of a share. The regulators want receipts.

Canonical URL: <https://datadriven.io/problems/financial_trading_warehouse>

Domain: Data Modeling · Difficulty: hard · Seniority: L6

## Problem

We're building the data warehouse for a trading platform. We handle stocks, ETFs, and crypto with fractional share support. Trades settle in multiple currencies, and the SEC requires point-in-time snapshots for audits. Design the dimensional model.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether a candidate can combine **DECIMAL discipline**, **multi-currency fact design**, and **SCD Type 2 on regulated dimensions**. Trading warehouses are the domain where every one of these choices is a regulatory requirement, and interviewers look for candidates who call them out explicitly.

> **Trick to Solving**
>
> Before drawing tables, a strong candidate asks: what SEC audit window applies, and how are multi-currency trades reported? The signal is 'point-in-time snapshots for audits,' which means every dimension is Type 2 and every fact row stores both native and home currency.
> 
> 1. Use DECIMAL (not FLOAT) for every monetary column
> 2. Store native amount, home amount, and FX rate on the trade fact
> 3. Make all dimensions Type 2 with effective dates
> 4. Append-only fact with offsetting trades for corrections

---

### Break down the requirements

#### Step 1: Decimal or nothing

Every monetary value is DECIMAL with a scale that matches the fractional share requirement. FLOAT or DOUBLE introduces rounding errors that compound on reconciliation and fail SEC audits on the first spot check.

#### Step 2: Carry native and home currency

A euro trade at 1.08 USD/EUR stores `native_amount`, `home_amount`, and `fx_rate`. Storing only one of the three forces a lossy recomputation at query time when FX rates shift.

#### Step 3: Type 2 dimensions everywhere

Account attributes, instrument metadata, and FX rates all evolve over time. Point-in-time audits require that a trade from 2023 joins back to the 2023 state of each dimension, not the current state. Type 2 with effective dates is the only model that holds up.

#### Step 4: Append-only trade fact

A correction is never an in-place update. It is a new trade row that offsets the original. The fact table is immutable, and the net position is computed from the sum of all fact rows per account per instrument.

#### Step 5: Snapshot positions periodically

`fact_positions` is a daily periodic snapshot per account per instrument. It makes as-of position queries cheap and gives the SEC a stable artifact rather than a sum-over-history at audit time.

---

### The solution

Below is one defensible design: SCD Type 2 on every dimension, append-only trade fact with dual-currency measures, and a daily position snapshot for audit stability.

> **Why this works**
>
> Immutable facts plus Type 2 dimensions produce a warehouse that can reproduce any past statement exactly. The daily position snapshot sidesteps sum-over-full-history at audit time. The trade-off is storage (several surrogate keys and duplicate measures per trade) for complete reproducibility.

> **Interviewers watch for**
>
> A strong candidate says DECIMAL in the first sentence and Type 2 in the second. They also propose offsetting trades for corrections rather than in-place updates. Weak candidates use FLOAT, mutate dimension rows, and then cannot explain how the 2023 audit will reproduce the original trade confirmations.

> **Common pitfall**
>
> Storing only `native_amount` and recomputing home amount at query time using the current FX rate. Yesterday's P&L silently drifts every time an FX rate updates, and reconciliation against the ledger fails.

---

### The analysis pattern

**Account P&L in home currency for a historical period**

```sql
SELECT
    a.account_id,
    i.symbol,
    SUM(CASE WHEN t.side = 'buy' THEN -t.home_amount ELSE t.home_amount END) AS realized_pnl_home
FROM fact_trades t
JOIN dim_accounts a
    ON a.account_key = t.account_key
    AND a.effective_from <= '2024-06-30'
    AND (a.effective_to IS NULL OR a.effective_to > '2024-06-30')
JOIN dim_instruments i
    ON i.instrument_key = t.instrument_key
    AND i.effective_from <= '2024-06-30'
    AND (i.effective_to IS NULL OR i.effective_to > '2024-06-30')
JOIN dim_dates d ON d.date_key = t.trade_date_key
WHERE d.full_date BETWEEN '2024-01-01' AND '2024-06-30'
GROUP BY a.account_id, i.symbol
```

---

### Trade-offs and alternatives

**Type 2 dimensions plus append-only fact**

Familiar Kimball pattern, fits most BI tools, audit friendly. Cost: joins carry an effective-date predicate and corrections are offset rows.

**Bitemporal fact with valid-time columns**

Bitemporal fact tracks both effective and knowledge time on each row. Cost: more complex query semantics and most SQL engines need explicit temporal joins, but it supports 'what did we think was true as of this date' questions directly.

---

## Common follow-up questions

- How do you reconstruct a trade confirmation from 2021 exactly, including account name at that time? _(Tests point-in-time joins against SCD Type 2 dimensions.)_
- How do you handle a stock split that changes `quantity` on all historical positions? _(Tests corporate action modeling and whether splits are adjustment rows or recomputed snapshots.)_
- What if an FX rate is revised after end-of-day and trades need to be restated? _(Tests whether FX revisions produce offsetting trades or a new snapshot at a later knowledge time.)_
- How do you enforce that `home_amount = native_amount * fx_rate` at write time? _(Tests CHECK constraints, triggers, or ETL contract validation.)_
- At 100M trades per day, how do you partition `fact_trades` for point-in-time audit queries? _(Tests partitioning by `trade_date_key` and clustering by account_key.)_
- How do you handle GDPR deletion requests for a retail account with a seven-year SEC retention? _(Tests the conflict between deletion obligations and regulatory retention.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/financial_trading_warehouse)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.