A market-intelligence platform serves four customer surfaces from four sources, each at a distinct f
A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.
- Domain
- Pipeline Design
- Difficulty
- medium
Problem
A market-intelligence platform serves four customer surfaces from four sources, each at a distinct freshness tier: a sub-second real-time price-alert feed, a sub-5-minute trader dashboard, a daily research mart, and a monthly regulatory archive (< 24h freshness). The team currently runs a Lambda architecture with five engines and an on-call rotation that pages 6 times per night. Apply the entire advanced tier: (a-s0 Lambda) recognize that the existing Lambda system pays a two-codebase tax; (a-s1 Kappa) build a Kappa-style core for the two real-time tiers using long-retention tiered-storage on Kafka plus a single unified streaming pipeline (Flink, Spark Structured Streaming, or Beam) writing to an Iceberg, Delta, or Hudi materialized view; (a-s2 unified engines) use one engine for both real-time tiers via trigger configuration rather than two engines; (a-s3 per-node freshness tiering) annotate every node with an explicit slaFreshness consistent with downstream needs; (a-s4 Lambda-to-Kappa principles) keep the immutable event log as the source of truth so replay is always available. For the daily research mart and monthly regulatory archive, use a separate batch path: a shared bronze raw layer in object storage (S3, GCS, or ADLS) feeds a warehouse mart (Snowflake, BigQuery, Redshift, or Databricks) with slaFreshness < 24h. Three distinct freshness tiers must be visible across the canvas (real-time or < 1min, < 15min, < 24h).
Practice This Problem
Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.