Learn Practice Interview Discuss Daily Jobs

A 2024-era streaming system on the canvas runs a single Flink pipeline producing one materialized vi

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain: Pipeline Design
Difficulty: medium

Problem

A 2024-era streaming system on the canvas runs a single Flink pipeline producing one materialized view. The Kafka log only retains a few days in-cluster, so reprocessing more than a week of history is impossible: a bug fix that requires replaying a month of orders cannot be applied without losing data. Apply the Kappa architecture this section just taught. Make the system replay-capable: (1) add a long-retention tiered-storage backing for the Kafka log in object storage (S3, GCS, or ADLS) so the event log can hold 12-24 months of events affordably, and (2) add a parallel materialized view (Snowflake mart_orders_v2 or a separate warehouse table) so the Flink pipeline can replay the log into a new view during a bug-fix or schema migration without disturbing the live v1 view. Once v2 catches up to live and is validated, the dashboard cuts over. Do not add a batch layer; Kappa is stream-only with batch as replay through the same code path.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.