Streaming-Only Reprocessing

Concepts: paKappaArch

What They Want to Hear 'Reprocessing in Kappa means replaying the event log through a corrected version of the streaming job. For long-duration replays (months of data), I use tiered replay: recent data replays from Kafka at full speed, older data replays from S3 archives at reduced throughput. I write replay output to a shadow table, validate against the original output for known-correct windows, and swap on validation pass. The critical constraint: the replay job must consume the same event schema versions that existed at each point in time, or the results will differ from original processing.' This is the answer that shows you understand that replay is not just 'play the events again' but a controlled data correction operation.