A Spark job writes one tiny file per task, producing tens of thousands of small files per day
A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.
- Domain
- Pipeline Design
- Difficulty
- medium
Problem
A Spark job writes one tiny file per task, producing tens of thousands of small files per day. Downstream reads are dominated by file-open overhead. Design output that writes right-sized files.
Practice This Problem
Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.