DataDriven
LearnPracticeInterviewDiscussDailyJobs

A daily Spark join runs for hours

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
medium

Problem

A daily Spark join runs for hours. The Spark UI shows one task running for 90% of the runtime while the rest finish fast: a single hot key is skewing the shuffle. Adding executors didn't help. Design the job so the skewed join completes on time.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons