Learn Practice Interview Discuss Daily Jobs

Fix Skewed Viewing Events Pipeline

A hard Spark mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.

Domain: Spark
Difficulty: hard
Seniority: L5

Interview Prompt

You are the on-call data engineer at a streaming company. The nightly `viewing_engagement` Spark job just paged you. It normally finishes in 45 minutes but has been running for over two hours and is still stuck. The job joins a large `event_data` table (800M rows/day of viewing, playback, and interaction events) against a small `users` dimension (2M subscribers) to produce daily engagement metrics by event type and account tier. Your SLA is 60 minutes. Diagnose the root cause using the Spark UI evidence and fix the job so it meets SLA.

Summary

Your nightly Spark job just paged you. One task has 40% of the data.

How This Interview Works

Read the vague prompt (just like a real interview)
Ask clarifying questions to the AI interviewer
Write your spark solution with real code execution
Get instant feedback and a hire/no-hire decision