A streaming pipeline ingests three sources with different lateness profiles
A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.
- Domain
- Pipeline Design
- Difficulty
- medium
Problem
A streaming pipeline ingests three sources with different lateness profiles. Source A is a high-volume mobile event stream with a long retry tail (99.9th percentile lateness: 4 hours). Source B is an IoT sensor stream where individual partitions go idle for hours at a time. Source C is a financial market-data feed where the producer emits explicit end-of-session marker events. The section names four watermark strategies (ascending timestamps, bounded out-of-orderness, punctuated, per-key) and one operational fix (idle-partition detection). Pick the watermark strategy by adding three watermark generator transforms, one per source, each named to state the strategy and the parameters that match its source's profile, and add an idle-detection annotation where a source needs it.
Practice This Problem
Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.