Loading...
Read the Plan
A easy spark interview practice problem on DataDriven. Write and execute real spark code with instant grading.
- Domain
- spark
- Difficulty
- easy
- Seniority
- mid
Problem
The order enrichment job joins a 500M-row orders table (80 GB) against a 5,000-row stores dimension (30 MB) on store_id. The join takes 12 minutes and shuffles 80 GB. The physical plan shows SortMergeJoin with Exchange (shuffle) on both sides. The stores table is 30 MB. Why did Spark choose SortMergeJoin, and how do you fix it?
Practice This Problem
Solve this spark problem with real code execution. DataDriven runs your solution and grades it automatically.