DataDriven
LearnPracticeInterviewDiscussDailyJobs

The Box That Won't Fit the Data

A hard Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
hard
Seniority
senior

Problem

Your nightly Spark job rolls a 100GB event export up to per-account daily totals, but the only box it runs on has 5GB of RAM and no cluster to fall back on. Land those totals durably in the local data lake without the job dying when the 100GB refuses to fit in memory.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons