Loading section...

GC and OOM Debugging

Concepts: paMemoryManagement

What They Want to Hear 'High GC time means the JVM is spending too long cleaning up objects. In Spark, this usually means the executor is holding too many objects in memory: large hash maps from joins, cached datasets, or broadcast variables. My debugging process: (1) check GC logs for the ratio of young-gen vs full GC, (2) check the Spark UI for spill metrics on that executor, (3) check if a skewed partition is forcing one executor to hold disproportionate data. The fix is usually repartitioning (smaller partitions = fewer objects per executor), increasing executor memory, or reducing the data held in memory.' This is the answer that shows you can read GC logs and correlate them with Spark metrics.