Loading...

Read the Plan

A easy spark interview practice problem on DataDriven. Write and execute real spark code with instant grading.

Domain
spark
Difficulty
easy
Seniority
mid

Problem

The order enrichment job joins a 500M-row orders table (80 GB) against a 5,000-row stores dimension (30 MB) on store_id. The join takes 12 minutes and shuffles 80 GB. The physical plan shows SortMergeJoin with Exchange (shuffle) on both sides. The stores table is 30 MB. Why did Spark choose SortMergeJoin, and how do you fix it?

Practice This Problem

Solve this spark problem with real code execution. DataDriven runs your solution and grades it automatically.