Loading...
Salt the Hot Merchant
A hard spark interview practice problem on DataDriven. Write and execute real spark code with instant grading.
- Domain
- spark
- Difficulty
- hard
- Seniority
- mid
Problem
The daily payment reconciliation Spark job joins 1.2 billion transactions against a 500K-row merchants dimension on merchant_id. It has been failing for three days. Spark UI shows one task processing 38% of all rows while the other 199 finish in seconds. The hot merchant is your company's internal payment processor that handles all driver payouts. You cannot broadcast merchants because a downstream join adds a 2 GB enrichment table. Propose and implement a salting strategy.
Practice This Problem
Solve this spark problem with real code execution. DataDriven runs your solution and grades it automatically.