DataDriven
LearnPracticeInterviewDiscussDaily

A medium Spark mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.

Domain
Spark
Difficulty
medium
Seniority
L5

Interview Prompt

A daily analytics job reads a 3 TB user_events Parquet table partitioned by event_date, filters to yesterday (about 10 GB), and joins against user_profiles. The job takes 40 minutes but should take 5. A colleague wrote the pipeline using a subquery pattern that defeats partition pruning. The physical plan shows a full table scan of all 3 TB. Rewrite the query so Catalyst pushes the date filter down to the file scan.

Summary

You renamed the column. Catalyst forgot how to prune.

How This Interview Works

  1. Read the vague prompt (just like a real interview)
  2. Ask clarifying questions to the AI interviewer
  3. Write your spark solution with real code execution
  4. Get instant feedback and a hire/no-hire decision

Related

  • All Mock Interviews
  • Practice Mode (untimed)
  • Spark Interview Questions
  • Data Engineering Interview Prep Guide
  • Practice Problems
  • Daily Challenge