DataDriven
LearnPracticeInterviewDiscussDailyJobs

A 2024-era streaming system on the canvas runs a single Flink pipeline producing one materialized vi

A medium Pipeline Design mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.

Domain
Pipeline Design
Difficulty
medium

Interview Prompt

A 2024-era streaming system on the canvas runs a single Flink pipeline producing one materialized view. The Kafka log only retains a few days in-cluster, so reprocessing more than a week of history is impossible: a bug fix that requires replaying a month of orders cannot be applied without losing data. Apply the Kappa architecture this section just taught. Make the system replay-capable: (1) add a long-retention tiered-storage backing for the Kafka log in object storage (S3, GCS, or ADLS) so the event log can hold 12-24 months of events affordably, and (2) add a parallel materialized view (Snowflake mart_orders_v2 or a separate warehouse table) so the Flink pipeline can replay the log into a new view during a bug-fix or schema migration without disturbing the live v1 view. Once v2 catches up to live and is validated, the dashboard cuts over. Do not add a batch layer; Kappa is stream-only with batch as replay through the same code path.

How This Interview Works

  1. Read the vague prompt (just like a real interview)
  2. Ask clarifying questions to the AI interviewer
  3. Write your pipeline design solution with real code execution
  4. Get instant feedback and a hire/no-hire decision

Related

  • All Mock Interviews
  • Practice Mode (untimed)
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Practice Problems
  • Daily Challenge