Loading section...

Multi-Watermark Strategies

Concepts: paLateData

What They Want to Hear 'When joining two streams, each has its own lateness profile. The join's watermark is the minimum of both input watermarks: the slowest source dictates when results can be emitted. If stream A has a 2-minute watermark and stream B has a 15-minute watermark, the join holds state for 15 minutes. This can consume significant memory. My approach: set per-source watermarks based on observed lateness, use an outer join with a timeout for the slower source, and emit early results with retraction when the slower source arrives.' This is the answer that shows you understand the state cost of multi-stream watermarks.