A 2024-era streaming system on the canvas runs a single Flink pipeline producing one materialized vi
A medium Pipeline Design mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.
- Domain
- Pipeline Design
- Difficulty
- medium
Interview Prompt
A 2024-era streaming system on the canvas runs a single Flink pipeline producing one materialized view. The Kafka log only retains a few days in-cluster, so reprocessing more than a week of history is impossible: a bug fix that requires replaying a month of orders cannot be applied without losing data. Apply the Kappa architecture this section just taught. Make the system replay-capable: (1) add a long-retention tiered-storage backing for the Kafka log in object storage (S3, GCS, or ADLS) so the event log can hold 12-24 months of events affordably, and (2) add a parallel materialized view (Snowflake mart_orders_v2 or a separate warehouse table) so the Flink pipeline can replay the log into a new view during a bug-fix or schema migration without disturbing the live v1 view. Once v2 catches up to live and is validated, the dashboard cuts over. Do not add a batch layer; Kappa is stream-only with batch as replay through the same code path.
How This Interview Works
- Read the vague prompt (just like a real interview)
- Ask clarifying questions to the AI interviewer
- Write your pipeline design solution with real code execution
- Get instant feedback and a hire/no-hire decision