DataDriven
LearnPracticeInterviewDiscussDailyJobs

A 2014-era streaming media company runs the Lambda content-engagement pipeline this section's worked

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
medium

Problem

A 2014-era streaming media company runs the Lambda content-engagement pipeline this section's worked example walked through. Two codebases (Spark + Storm), two on-call rotations, storage in HDFS plus HBase, occasional 0.4 percent drift between layers. The constraints that motivated Lambda have shifted: streaming engines now offer exactly-once, tiered storage makes long log retention affordable, unified engines mean one codebase. Apply the Lambda-to-Kappa migration this section walked through: add the Kappa replacement path alongside the existing Lambda layers (the migration order keeps both running until the cutover). Specifically add: (1) a Kafka tiered-storage backing in object storage (S3, GCS, or ADLS) so the event log can hold the longest backfill window; (2) a single Flink streaming pipeline that processes events end-to-end with exactly-once semantics; and (3) an Iceberg materialized view on object storage as the single canonical view (Iceberg gives ACID transactions, schema evolution, and time travel; Trino is the canonical serving engine that reads it). Do not delete the existing Lambda nodes; the migration order says they stay running until consumers cut over.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons