Loading...

Salt the Hot Merchant

A hard spark interview practice problem on DataDriven. Write and execute real spark code with instant grading.

Domain
spark
Difficulty
hard
Seniority
mid

Problem

The daily payment reconciliation Spark job joins 1.2 billion transactions against a 500K-row merchants dimension on merchant_id. It has been failing for three days. Spark UI shows one task processing 38% of all rows while the other 199 finish in seconds. The hot merchant is your company's internal payment processor that handles all driver payouts. You cannot broadcast merchants because a downstream join adds a 2 GB enrichment table. Propose and implement a salting strategy.

Practice This Problem

Solve this spark problem with real code execution. DataDriven runs your solution and grades it automatically.