Questions reported by candidates in 2024-2026 loops, paraphrased and de-identified. Each answer covers the approach, the gotcha, and the typical follow-up.
SQL · L4
Reconstruct account balance over time from trade events
Use a running SUM with window: SUM(trade_amount) OVER (PARTITION BY account_id ORDER BY ts ROWS UNBOUNDED PRECEDING). The trade_amount sign convention matters: positive for cash deposits and sells, negative for buys and withdrawals. Edge case the interviewer always asks about: corporate actions (stock splits, dividends, mergers) require multiplier adjustments to historical positions. Volunteer this without being asked. Follow-up: how do you handle a backdated correction that arrives 3 days late? Answer: rerun the materialized account_balance table from the corrected event onwards, not the entire history.
-- Account balance over time from trade events
SELECT
account_id,
ts,
trade_id,
trade_amount,
SUM(trade_amount) OVER (
PARTITION BY account_id
ORDER BY ts, trade_id -- tiebreak deterministically
ROWS UNBOUNDED PRECEDING
) AS running_balance_usd
FROM trade_events
WHERE ts >= :start_ts
ORDER BY account_id, ts;SQL · L4
Find users whose option position exceeds notional limit at end of day
GROUP BY user_id, contract_id. Compute notional as SUM(contract_count * strike_price * 100) for equity options (100 shares per contract). HAVING notional > limit. Edge cases interviewers probe: assignment events (option exercised becomes equity position), expiration events (out-of-money options vanish), exercises mid-day (position changes intraday). Volunteer the assumption about whether you're measuring at trade-time or at end-of-day mark.
SQL · L5
Compute realized P&L per user per day with FIFO accounting
Match sell trades against the earliest unfilled buys for the same security in the same account. This is a tax-lot identification problem. Naive approaches (LIFO, average cost) produce wrong tax numbers. Interviewers expect a walkthrough on a 5-buy-1-sell example. Edge cases: short positions (cover trades match against the original short), multi-currency trades (FX conversion at trade time), wash sales (loss disallowed if same security repurchased within 30 days). Senior signal: mentioning that wash-sale handling is usually pushed downstream to a tax engine, not done in SQL.
SQL · L5
Find duplicate trades across the broker and clearing house feed
FULL OUTER JOIN broker_trades to clearing_house_trades on trade_id. Classify each row: matched (both sides present with same fields), missing-in-clearing (broker only), missing-in-broker (clearing only), mismatch (both present but fields differ). The mismatch row is the highest-leverage one to discuss: what tolerance is acceptable on price (cents), quantity (zero), timestamp (sub-second clock skew).
SQL · L5
Detect potential wash sales across user accounts
Self-join trade_events on (account_owner_id, security_id) WHERE sell.ts < buy.ts AND buy.ts <= sell.ts + INTERVAL '30 days'. Wash sales apply across an individual's accounts (taxable + IRA), so the join uses owner_id not account_id. Volunteer the data privacy consideration: this query touches account-linkage data and may need PII access controls.
Python · L4
Process trade event stream and detect duplicates with TTL
Use a dict keyed on trade_id with last_seen_ts. On each incoming event, check if trade_id is in dict. If yes and within TTL window: drop. If no or past TTL: process and update dict. State the dedup window assumption (24h is typical for clearing reconciliation, 7d for SOX-aligned audit). Discuss memory growth: at 10M trades/day, dict needs LRU eviction or external cache (Redis).
from collections import OrderedDict
from datetime import datetime, timedelta
class TradeDedup:
def __init__(self, ttl_hours: int = 24, max_size: int = 1_000_000):
self.seen: OrderedDict[str, datetime] = OrderedDict()
self.ttl = timedelta(hours=ttl_hours)
self.max_size = max_size
def is_duplicate(self, trade_id: str, ts: datetime) -> bool:
if trade_id in self.seen:
last = self.seen[trade_id]
if ts - last <= self.ttl:
return True
self.seen[trade_id] = ts
self.seen.move_to_end(trade_id)
# LRU eviction
while len(self.seen) > self.max_size:
self.seen.popitem(last=False)
return FalsePython · L5
Implement event sourcing for account state with replay
Three components: (1) append-only event log in Kafka or Postgres; (2) state-derivation function that folds events into current state; (3) snapshot strategy. Snapshot every N events (typically 1000) so cold reads don't replay the entire history. Replay starts from latest snapshot before target ts, then applies events forward. Discuss why event sourcing wins for audit: the state is always reconstructible, corrections are new events not destructive updates, and the log itself is the audit trail.
Python · L5
Compute settled vs unsettled cash for an account
T+1 settlement means a sell on Monday produces unsettled cash that becomes settled on Tuesday. Walk through trade events, classify each as settled (trade_date + 1 business day <= today) or unsettled. Holiday and weekend handling is the gotcha. Use a holiday calendar (NYSE schedule) and pandas business day offsets, or hand-roll with a holidays set.
System Design · L5
Design the trade reconciliation pipeline (broker vs clearing house)
Broker side: internal trade events flow Kafka -> S3 raw -> Spark daily ETL -> Snowflake fact_broker_trade. Clearing house side: nightly file dropped to SFTP -> S3 -> Spark loader -> Snowflake fact_clearing_trade. Reconciliation: daily Spark job FULL OUTER JOIN on trade_id, classifies each row, writes deltas to fact_recon_delta with a status column. Alerts via PagerDuty on any non-zero delta count above tolerance. Audit log immutable. Cover failure modes: clearing file arrives 4 hours late (job retries hourly with idempotent merge), broker side has duplicate trade_ids (deduped in raw layer).
Internal trade systems
-> Kafka (trades topic, key=trade_id)
-> S3 raw landing (date-partitioned, immutable)
-> Spark daily ETL with run_id
-> Snowflake fact_broker_trade (MERGE on trade_id, run_id)
Clearing house
-> SFTP nightly file
-> S3 raw clearing
-> Spark loader (validate schema, dedup)
-> Snowflake fact_clearing_trade
Daily reconciliation
-> Spark FULL OUTER JOIN on trade_id
-> classify: matched, broker_only, clearing_only, mismatch
-> Snowflake fact_recon_delta
-> PagerDuty alert if delta_count > tolerance
-> immutable audit log to S3 audit/
System Design · L5
Design the regulatory CAT (Consolidated Audit Trail) pipeline
The CAT is a FINRA-mandated reporting pipeline that requires every order, modification, and execution to be reported within T+1. Architecture: trade and order events from internal systems -> Kafka -> Flink job (validate required CAT fields, enrich with reference data, format per CAT spec) -> S3 staging -> nightly batch SFTP upload to FINRA. Cover: schema evolution as CAT spec updates quarterly (use Avro with schema registry), late-arriving corrections (republish day-replacement file), full-day replay capability (Kafka retention + S3 raw events). The audit story matters as much as the pipeline.
System Design · L5
Design the position-tracking event-sourcing system
Trade events as the single source of truth in an immutable Kafka log. Position state derived by folding trades keyed by (account_id, symbol). Snapshots written every 1000 events for fast cold-read. Replay capability for corrections (apply a backdated correction event, replay from prior snapshot to now). Discuss why event sourcing beats CRUD-based position tracking: corrections are natural, audit is free, multiple consumers can derive different views (P&L, exposure, regulatory reports) from the same source.
System Design · L6
Design the real-time risk monitoring pipeline
Trade events -> Kafka -> Flink stateful (keyed by account_id) maintains rolling exposure metrics, computes margin requirements, evaluates risk rules in real-time. On threshold breach: emit alert event to Kafka, downstream service either restricts further trading on the account or pages risk team. Cover: state size estimate at 24M accounts with 30-day history, hot-account handling (whale traders), exactly-once for risk decisions (a missed threshold breach is a regulatory event), backfill for rules changes (historical events re-evaluated against new rules).
Modeling · L5
Design schema for tracking customer cash and securities positions
Three core tables. fact_position_event: immutable log, one row per cash or security state change, with event_type, amount, balance_after, ts, source_event_id. dim_account: Type 2 for status changes (active, restricted, closed). dim_security: Type 2 for symbol changes following corporate actions (Facebook -> Meta required updating all historical positions). Snapshot table fact_position_eod for fast end-of-day queries; rebuilt nightly from event log. Volunteer the trade-off: snapshot table is denormalized for performance, event log is normalized for correctness; the snapshot exists because nobody wants to fold 10 years of events to answer "what was Bob's position yesterday".
Modeling · L5
Model the order lifecycle including modifications and cancellations
Append-only fact_order_event with event_type column (Created, Modified, Cancelled, PartialFill, Filled, Rejected). Each event references the originating order_id. Materialized view fact_order_current rebuilt continuously, contains the current state of each order (one row per order_id, latest state). Edge case interviewers probe: how do you handle a Cancellation event that arrives after a Filled event? Answer: the order is filled; the cancellation is rejected by the broker; the event log shows both for audit; the materialized view shows Filled.
Behavioral · L5
Tell me about a time you found a bug that had financial impact
Robinhood interviewers ask this in some form on every behavioral round. Story should cover: how you found it (proactive monitoring vs reactive customer report), how you triaged within the first hour (alerted finance, froze affected accounts, started recompute), root cause analysis, what you changed in the process to prevent recurrence (added a data quality check, introduced a reconciliation alert, expanded test coverage). The Decision postmortem is graded heavily; without it the story sounds defensive.