# Read the Plan

> 30 MB table. 80 GB shuffle. Read the plan.

Canonical URL: <https://datadriven.io/problems/spark_explain_physical_plan_bottleneck>

Domain: PySpark · Difficulty: easy · Seniority: L4

## Problem

The order enrichment job joins a 500M-row orders table (80 GB) against a 5,000-row stores dimension (30 MB) on store_id. The join takes 12 minutes and shuffles 80 GB. The physical plan shows SortMergeJoin with Exchange (shuffle) on both sides. The stores table is 30 MB. Why did Spark choose SortMergeJoin, and how do you fix it?

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/spark_explain_physical_plan_bottleneck)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.