Put it all together: tune a distributed Spark pipeline
A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.
- Domain
- Pipeline Design
- Difficulty
- medium
Problem
Put it all together: tune a distributed Spark pipeline. Handle shuffle skew at the layout layer, compact small files into right-sized output, gate the daily run on its SLA with alerting, and validate output before publishing.
Practice This Problem
Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.