DataDriven
LearnPracticeInterviewDiscussDaily

A medium Spark mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.

Domain
Spark
Difficulty
medium
Seniority
L5

Interview Prompt

An iterative ML feature engineering pipeline reads a 200 GB base DataFrame and runs 8 sequential enrichment steps. Each step joins against a different dimension table and adds columns. A previous engineer cached the base DataFrame to speed up the repeated reads, but after step 4 executors start dying with OOM. The cache is eating so much memory that later steps have no room for shuffle data. Fix the caching strategy so the pipeline completes without OOM.

Summary

You cached 200 GB and forgot to let go.

How This Interview Works

  1. Read the vague prompt (just like a real interview)
  2. Ask clarifying questions to the AI interviewer
  3. Write your spark solution with real code execution
  4. Get instant feedback and a hire/no-hire decision

Related

  • All Mock Interviews
  • Practice Mode (untimed)
  • Spark Interview Questions
  • Data Engineering Interview Prep Guide
  • Practice Problems
  • Daily Challenge