Can You Re-run This Safely?

Distributed Idempotency and Cross-System Consistency At intermediate, idempotency is a single MERGE statement. The interviewer then asks: 'How do you coordinate idempotency across Kafka, Spark, and your warehouse when each can fail independently?' This is the distributed systems question that separates good from great candidates. The core problem you need to articulate: in a distributed pipeline, you can't use a single database transaction to span multiple systems. A Kafka consumer might commit its offset but the warehouse write fails. Or the warehouse write succeeds but the offset commit fails, causing a reprocess. The interviewer wants to hear you name this problem before jumping to solutions. The follow-up trap: 'What about WAL-based recovery?' WAL (Write-Ahead Log) is the gold standard