DataDriven
LearnPracticeInterviewDiscussDailyJobs

Three Hours for Yesterday's Numbers

A medium spark interview practice problem on DataDriven. Write and execute real spark code with instant grading.

Domain
spark
Difficulty
medium
Seniority
L5

Problem

The nightly `daily_store_sales` Spark job is breaching SLA. It reads a source table of store-level daily sales (one row per store per product per day) and pivots it into one row per store per day with product-level metrics as columns. The job takes 3 hours against a 45-minute SLA because it reads the entire source table every night. Diagnose and fix it.

Summary

18 terabytes scanned. 50 megabytes needed.

Practice This Problem

Solve this spark problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons