Loading section...
Handling Late-Arriving Data
Concepts: dmLateArriving
When Events Arrive After the Window Closes In the real world, events do not arrive in order. A mobile app queues clicks while offline and sends them hours later. A payment gateway batches settlements daily. A sensor loses connectivity and dumps a backlog. If your pipeline processes events by wall-clock time (when the pipeline sees them), all of these produce wrong results. The mobile clicks land in the wrong hour. The settlements land on the wrong day. The fix: process by event time (when the event happened), not processing time (when the pipeline sees it). But this creates a new problem: how do you know when all events for a given hour have arrived? The answer is: you do not. You use a watermark, a heuristic that says 'I believe all events before time T have arrived.' Events arriving afte