DataDriven
LearnPracticeInterviewDiscussDailyJobs

A 2024-era streaming system on the canvas runs a single Flink pipeline producing one materialized vi

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
medium

Problem

A 2024-era streaming system on the canvas runs a single Flink pipeline producing one materialized view. The Kafka log only retains a few days in-cluster, so reprocessing more than a week of history is impossible: a bug fix that requires replaying a month of orders cannot be applied without losing data. Apply the Kappa architecture this section just taught. Make the system replay-capable: (1) add a long-retention tiered-storage backing for the Kafka log in object storage (S3, GCS, or ADLS) so the event log can hold 12-24 months of events affordably, and (2) add a parallel materialized view (Snowflake mart_orders_v2 or a separate warehouse table) so the Flink pipeline can replay the log into a new view during a bug-fix or schema migration without disturbing the live v1 view. Once v2 catches up to live and is validated, the dashboard cuts over. Do not add a batch layer; Kappa is stream-only with batch as replay through the same code path.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons