DataDriven
LearnPracticeInterviewDiscussDailyJobs

Let AQE Handle It

A medium spark interview practice problem on DataDriven. Write and execute real spark code with instant grading.

Domain
spark
Difficulty
medium
Seniority
L5

Problem

A Spark 3.4 job joins a 400 GB search_logs table against a 60 GB ad_impressions table on query_id. Takes 90 minutes. Spark UI shows moderate skew: the top partition has 8x the median row count. A colleague suggests salting, but the codebase is complex and salting would require changes in three downstream jobs. Enable and configure Adaptive Query Execution to let Spark handle the skew at runtime, coalesce small partitions, and optimize the join strategy automatically.

Summary

Five tasks take 35 minutes. The other 195 take 30 seconds.

Practice This Problem

Solve this spark problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons