# Fix Skewed Viewing Events Pipeline

> Your nightly Spark job just paged you. One task has 40% of the data.

Canonical URL: <https://datadriven.io/problems/spark_skew_broadcast_user_events>

Domain: PySpark · Difficulty: hard · Seniority: L5

## Problem

You are the on-call data engineer at a streaming company. The nightly `viewing_engagement` Spark job just paged you. It normally finishes in 45 minutes but has been running for over two hours and is still stuck. The job joins a large `event_data` table (800M rows/day of viewing, playback, and interaction events) against a small `users` dimension (2M subscribers) to produce daily engagement metrics by event type and account tier. Your SLA is 60 minutes. Diagnose the root cause using the Spark UI evidence and fix the job so it meets SLA.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/spark_skew_broadcast_user_events)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.