A retailer's revenue pipeline runs Flink streaming with a $4,000/month bill, feeding a CFO dashboard
A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.
- Domain
- Pipeline Design
- Difficulty
- medium
Problem
A retailer's revenue pipeline runs Flink streaming with a $4,000/month bill, feeding a CFO dashboard that the CFO reads once each morning at 7am Pacific. The streaming pipeline was built without a cost conversation, and the latency it provides has zero dollar value because the consumer reads once a day. Apply the cost-story framing this section just taught: the latency value is zero, streaming costs 5-50x more than equivalent batch, the right answer is to downgrade. Replace the Flink streaming transform with nightly batch (plain Spark, dbt, or PySpark are all batch tools that satisfy this), remove the local RocksDB state store (batch maintains no inter-run state), tag the warehouse mart with slaFreshness < 24h to match the actual consumer freshness need, and remove the real-time slaFreshness from the streaming nodes. The CFO dashboard's freshness need is unchanged; only the cost is.
Practice This Problem
Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.