# The Panel and the Set-Top Boxes

> Set-top boxes tell you who watched. Projection tells you how many.

Canonical URL: <https://datadriven.io/problems/the_panel_and_the_set_top_boxes>

Domain: Pipeline Design · Difficulty: hard · Seniority: L6

## Problem

We measure TV viewership for every major network and streaming service in the US. We collect second-by-second viewing data from a 40,000-household panel and set-top box data from 45 million devices, and our clients - the networks, advertisers, and agencies - depend on our overnight ratings reports. Design a pipeline that processes this data, applies the panel weighting to project national estimates, and reliably delivers reports to clients by 10am every morning.

## Worked solution and explanation

### Why this problem exists in real interviews

TV ratings with four properties: morning delivery to clients, late operators that can't block national ratings, transparent revisions when an operator restates, and per-show isolation so one show's failure doesn't take down the morning. The trap is one big projection job.

The default reach is one nightly job that pulls all operators, projects every show, and ships ratings. One late operator blocks the entire run; missing the deadline triggers contractual penalties. Revisions to prior periods get rolled into the current run silently. One show's projection fails and takes the rest with it.

> **Trick to Solving**
>
> Per-operator ingest, per-show projections, revision-aware delivery, durable archive of every delivered ratings file.
> 
> 1. Per-operator ingest with sensors fires alerts before the morning deadline; one operator's lateness is named, not 'pipeline late.'
> 2. Per-show projection tasks run independently; one show's failure stays its task and the rest deliver.
> 3. Revisions to prior periods land in a separate revision delivery; the original delivered file stays in the archive immutably and clients see both.
> 4. Every delivered file lands in an immutable archive so revisions reference the original.

---

### Walk the requirements

#### Step 1: Morning deliveries gated against the deadline with per-operator alerting

The orchestrator runs per-operator ingest with sensors firing before the morning deadline if any operator is late. The national ratings run uses what's available; late operators trigger a per-operator alert with the operator named. Without orchestration the deadline isn't owned; without per-operator alerting on-call sees 'pipeline late' rather than 'operator X is late.'

#### Step 2: Per-show projection isolates one show's failure

Each show's projection runs as its own task. One show's failure (a bad input row, a model issue) stays its task and emits an alert; the rest of the shows continue to delivery. A 'one giant projection job' is the version where one show's failure delays the morning's deliveries; per-show isolation is what keeps the other clients' ratings on time.

#### Step 3: Revisions deliver alongside originals; nothing changes silently

When an operator delivers late or revises a prior period, a revision job runs and delivers a new ratings file referencing the original by id. Clients see both the original and the revision; downstream consumers can choose which to use. The archive holds both immutably. Silent restatement is what the requirement names; the explicit revision delivery is the contract.

#### Step 4: Immutable archive of every delivered ratings file

Each delivered file writes to a cold-storage archive that doesn't allow modification. Revisions write as new entries linked to the original. Clients can audit any past delivery; regulators can ask for the original-as-shipped versus the revision. Without the archive, restatement leaves no trace of what was shipped originally; with it, the trail is queryable.

---

### The shape that fits

> **What this design gives up**
>
> Per-operator and per-show tasks make the DAG wider than one big job; the revision delivery flow doubles the delivery surface for affected periods; the immutable archive grows with every delivery. Implementation cost is the price; the win is morning deliveries that don't fail because one operator is late or one show is broken, revisions that clients see explicitly, and an audit trail of every delivered ratings file.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - An orchestration layer schedules per-operator ingest and per-show projection with deadline alerting.
> - Per-show projection isolation: one show's failure doesn't block the others' delivery.
> - Revisions deliver as new files referencing the original; the archive holds both.
> - An immutable delivery archive holds every ratings file delivered.

> **The mistake that ships**
>
> What gets shipped runs one big projection job over all operators and all shows. One late operator blocks the morning; one show's failure takes the others with it. Revisions roll silently into the current run. Contractual penalties hit. The eventual rebuild adds per-operator ingest, per-show projection isolation, explicit revision delivery, and the immutable archive.

---

## Common follow-up questions

- An operator delivers a revised period a week after the original. What does this design do, and what do clients see? _(Tests whether the candidate sees the revision_processor producing a new delivery file referencing the original; the archive holds both; clients receive a notification of the revision. The original stays in the archive unchanged so the audit shows what was shipped originally and what was revised.)_
- A new show needs to be added with a different projection model. What in this design extends, and what doesn't? _(Tests whether the candidate sees per-show tasks as the extension point: a new show task with its own model, scheduled by the orchestrator alongside the others. The shared per-operator ingest, the warehouse, and the delivery archive don't change.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_panel_and_the_set_top_boxes)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.