Loading...
Fix Skewed Viewing Events Pipeline
A hard spark interview practice problem on DataDriven. Write and execute real spark code with instant grading.
- Domain
- spark
- Difficulty
- hard
- Seniority
- senior, staff
Problem
You are the on-call data engineer at a streaming company. The nightly `viewing_engagement` Spark job just paged you. It normally finishes in 45 minutes but has been running for over two hours and is still stuck. The job joins a large `event_data` table (800M rows/day of viewing, playback, and interaction events) against a small `users` dimension (2M subscribers) to produce daily engagement metrics by event type and account tier. Your SLA is 60 minutes. Diagnose the root cause using the Spark UI evidence and fix the job so it meets SLA.
Practice This Problem
Solve this spark problem with real code execution. DataDriven runs your solution and grades it instantly.