Loading section...

Explain Spark Architecture

Concepts: paSparkExecutionModel, paDistributedPrimitives

"Explain Spark architecture" at the advanced level means explaining where the standard model breaks down. Adaptive Query Execution, dynamic allocation, and speculative tasks are the three mechanisms that override the static plan - and each can go wrong. Adaptive Query Execution (AQE) AQE re-optimizes the query plan at shuffle boundaries using runtime statistics. After each stage completes, Spark collects actual partition sizes and row counts, then replans the remaining stages. Three optimizations: coalescing shuffle partitions (merging small partitions), switching join strategies (sort-merge → broadcast if one side turned out small after filtering), and skew join optimization (splitting skewed partitions automatically). When AQE Gets It Wrong AQE uses post-shuffle statistics, which means