Five Times the Traffic, Five Times the Bill
A hard Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.
- Domain
- Pipeline Design
- Difficulty
- hard
- Seniority
- L7
Problem
Our platform's data volumes are unpredictable: we see 5x swings between our quietest and busiest hours, with sudden spikes during product launches. We've been running a fixed-size Spark cluster that's over-provisioned 80% of the time and still falls behind during spikes. Operations needs to act on issues within a couple minutes; analytics dashboards tolerate up to 15. A small fraction of incoming events arrive malformed and end up polluting the reports analysts read. The CFO wants the bill to stop swinging with traffic and to come down from where it is today. Design a pipeline that handles variable volume, serves both consumers on the right cadence, keeps bad events out of analytics, and keeps costs predictable.
Summary
Scale up when needed. Do not bankrupt the team.
Practice This Problem
Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.