Loading section...

Cluster Sizing

Concepts: paMemoryManagement

What They Want to Hear 'I use the 5-core rule as a starting point. Each executor gets 5 cores, which balances parallelism against HDFS throughput and JVM overhead. For 2TB of data with 128MB target partitions, I need about 16,000 partitions. With 5 cores per executor and 20 executors, I process 100 tasks concurrently. The job finishes in ceiling(16,000 / 100) = 160 waves. If each wave takes 30 seconds, that is about 80 minutes.' This is the answer that shows you can do the back-of-envelope math interviewers love.