Loading section...
How Do You Size the Cluster?
Concepts: paMemoryManagement, paCostOptimization
"You need to process 2TB of data daily. How do you size your Spark cluster?" This tests whether you understand memory, cores, and executors as interacting constraints rather than independent knobs. The 5-Core Rule Use 5 cores per executor. This is the well-tested sweet spot. More than 5 cores causes excessive GC pressure and HDFS throughput bottlenecks (each core opens concurrent connections). Fewer than 5 underutilizes memory. On a node with 16 cores, run 3 executors (5 cores each, 1 core reserved for OS/YARN). Memory Breakdown Executor memory splits into three regions. Unified memory (default 60% of heap) handles both execution (shuffles, sorts, aggregations) and storage (cached DataFrames). Reserved memory (300MB) is off-limits. User memory (remaining 40%) holds your UDF objects and dat