Loading section...

This Job Is Slow

Concepts: paShuffleOptimization, paSparkExecutionModel

"This Spark job used to take 20 minutes. Now it takes 3 hours. What do you do?" This is the most common Spark question across all companies. The interviewer is testing your debugging methodology, not a single trick. The Debugging Sequence Check shuffle metrics FIRST. 80% of slow Spark jobs are shuffle-bound. Open the Spark UI → Stages tab → sort by shuffle write. If one stage writes 500GB of shuffle data while others write 5GB, you found the problem. Do not start with code review - start with the UI. Reading the Spark UI The Stages tab shows each stage's task count, duration, shuffle read/write, and input/output size. Click into a stage to see the task-level distribution. A healthy stage has uniform task durations. A sick stage has one task at 45 minutes and 199 tasks at 30 seconds - t