Loading section...
Explain Spark Architecture
Concepts: paSparkExecutionModel
"Walk me through how Spark executes a query." This is the opener. If your answer is vague, the interviewer downgrades your level. If it is precise, they trust your debugging answers later. The Driver-Executor Model Spark runs one driver process and N executor processes. The driver parses your code into a logical plan, optimizes it via Catalyst, converts it to a physical plan, and splits that plan into stages. Each stage contains tasks - one task per partition. The driver sends tasks to executors, which run them in parallel on their allocated cores. Stages and Shuffle Boundaries Spark splits the physical plan into stages at shuffle boundaries. A shuffle happens whenever data must be redistributed across the cluster - groupBy, join, repartition, distinct. Everything before a shuffle can