# Salt the Hot Merchant

> One merchant owns 38% of your rows. Salt or suffer.

Canonical URL: <https://datadriven.io/problems/spark_salting_payment_hotkey>

Domain: PySpark · Difficulty: hard · Seniority: L4

## Problem

The daily payment reconciliation Spark job joins 1.2 billion transactions against a 500K-row merchants dimension on merchant_id. It has been failing for three days. Spark UI shows one task processing 38% of all rows while the other 199 finish in seconds. The hot merchant is your company's internal payment processor that handles all driver payouts. You cannot broadcast merchants because a downstream join adds a 2 GB enrichment table. Propose and implement a salting strategy.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/spark_salting_payment_hotkey)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.