Loading lesson...
Staff-level streaming: multi-region Kafka, state backends, and exactly-once end-to-end
What They Want to Hear 'Multi-region Kafka has three architectures. Active-passive: one primary cluster, MirrorMaker 2 replicates to a standby for disaster recovery. Active-active: each region has its own cluster, bidirectional replication, consumers process local events. Stretch cluster: a single Kafka cluster with brokers in multiple regions, synchronous replication. I choose active-active for most cases because it gives each region low-latency access to local data while maintaining global vis
What They Want to Hear 'Distributed transactions across services are fragile and do not scale. The saga pattern replaces a single transaction with a sequence of local transactions, each publishing an event that triggers the next step. If a step fails, compensating transactions undo the prior steps. There are two coordination styles: choreography (each service listens for events and acts independently) and orchestration (a central coordinator directs the sequence). I prefer orchestration for comp
What They Want to Hear 'When joining two streams, each has its own lateness profile. The join's watermark is the minimum of both input watermarks: the slowest source dictates when results can be emitted. If stream A has a 2-minute watermark and stream B has a 15-minute watermark, the join holds state for 15 minutes. This can consume significant memory. My approach: set per-source watermarks based on observed lateness, use an outer join with a timeout for the slower source, and emit early results
What They Want to Hear 'I build automated remediation in tiers. Tier 1: automatic retry. Transient errors are retried with exponential backoff. Most DLQ events resolve themselves. Tier 2: automatic transform. Known schema mismatches are fixed by a remediation function: the event is transformed to the expected schema and replayed. Tier 3: human review. Events that fail automated remediation are grouped by error type, and the highest-impact group is surfaced to the on-call engineer. The system tra
What They Want to Hear 'For large state, I use RocksDB as the state backend. RocksDB stores state on local disk instead of the JVM heap, so it can handle state larger than memory. Checkpoints write state snapshots to a distributed filesystem (S3 or HDFS). For 500GB of state, I use incremental checkpoints: each checkpoint writes only the changed data since the last checkpoint, reducing checkpoint duration from minutes to seconds. The critical tuning parameters are checkpoint interval (balance bet