Loading section...

Broadcast joins (distributed)

In distributed query engines, data is split across multiple nodes. When you join two tables, the engine must decide how to bring matching rows together. This decision dramatically affects query performance. The Distribution Problem Consider joining a 100 billion row fact table with a 10,000 row dimension table. The fact table is distributed across 100+ nodes. For each fact row to find its dimension match, the dimension data must be accessible. Broadcast Join Mechanics A broadcast join copies the entire smaller table to every worker node. Each worker then performs a local join between its partition of the large table and the complete small table. Broadcast Selection Query engines automatically select broadcast joins when the smaller table fits in memory. The threshold varies by engine: Broa