Loading section...

Idempotency in Streaming vs Batch

Idempotency works differently in batch and streaming, and interviewers test whether you understand why. Batch idempotency is about partition replacement. Streaming idempotency is about exactly-once processing across a distributed system with continuous data flow. Batch Idempotency: Replace the Partition In batch, idempotency is straightforward. You process a bounded chunk (one day, one hour) and overwrite the output partition. The partition is the unit of idempotency. Re-running the same partition produces the same result. Streaming Idempotency: Exactly-Once Semantics The Consume-Transform-Produce Pattern The key insight: consumer offsets are committed as part of the producer transaction. Either the output AND the offset commit succeed together, or neither does. This prevents the 'processe