A streaming pipeline ingests three sources with different lateness profiles
A medium Pipeline Design mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.
- Domain
- Pipeline Design
- Difficulty
- medium
Interview Prompt
A streaming pipeline ingests three sources with different lateness profiles. Source A is a high-volume mobile event stream with a long retry tail (99.9th percentile lateness: 4 hours). Source B is an IoT sensor stream where individual partitions go idle for hours at a time. Source C is a financial market-data feed where the producer emits explicit end-of-session marker events. The section names four watermark strategies (ascending timestamps, bounded out-of-orderness, punctuated, per-key) and one operational fix (idle-partition detection). Pick the watermark strategy by adding three watermark generator transforms, one per source, each named to state the strategy and the parameters that match its source's profile, and add an idle-detection annotation where a source needs it.
How This Interview Works
- Read the vague prompt (just like a real interview)
- Ask clarifying questions to the AI interviewer
- Write your pipeline design solution with real code execution
- Get instant feedback and a hire/no-hire decision