Loading section...

What About Late Data?

Watermarks, Windows, Late Arrivals This is the #1 follow-up question after you propose a streaming architecture. Interviewers commonly ask some version of: "A user clicks at 11:59 PM but the event arrives at 12:03 AM. Which day does it belong to?" If you've already closed the daily window at midnight, you've got a problem. The interviewer wants to hear three words: watermark, allowed lateness, dead-letter. Start your answer here: a watermark is the system's estimate of "how far behind is the latest event I haven't seen yet?" It's a threshold. Any event with a timestamp before the watermark is considered late. Your answer should define this crisply in one sentence before diving into configuration details. The follow-up will be: "How do you set the allowed lateness?" Your answer should frame