# Sixty Minutes, Every Hour

> Every hour, on the hour. No excuses.

Canonical URL: <https://datadriven.io/problems/sixty_minutes_every_hour>

Domain: Pipeline Design · Difficulty: medium · Seniority: L6

## Problem

We have an OLTP system that powers our customer-facing application. The analytics team needs this data refreshed every hour in the warehouse. Design an ETL pipeline that loads data from the source system to the target with consistency and reliability.

## Worked solution and explanation

### Why this problem exists in real interviews

Hourly loading sounds like a scheduling problem and is actually four properties that conflict if you do them in the obvious order: zero query load on the OLTP, hourly atomicity on the warehouse, replayable recovery, and propagated deletes. The trap is doing them one at a time and discovering the design has to support them all together.

The whiteboard answer is hourly SELECTs against the OLTP that pull rows changed in the last hour and append to the warehouse. The OLTP team notices the new query load and asks the data team to back off. Dashboards run mid-load and see some tables updated, others not. An outage drops two hours of changes; the rerun appends what it pulls and double-counts because the original partial run already wrote some of them. A customer is deleted at the source and stays in the warehouse forever.

> **Trick to Solving**
>
> Log-based CDC, atomic hour swap on the warehouse, idempotent replay on a stable change id, deletes as events not row removals.
> 
> 1. Log-based CDC reads the WAL/binlog the database is already writing; zero query load on the OLTP. Polling SELECTs are the version operations rejects.
> 2. Hourly atomic swap: the new hour's data builds in a staging table, then a transactional rename or partition-overwrite swaps it into the warehouse so consumers see either the old or the new hour, not a mix.
> 3. Every change carries a stable change id from the WAL/binlog; warehouse writes are idempotent on that id so a rerun produces the same state regardless of where the prior run stopped.
> 4. Deletes travel as events on the same stream; the warehouse writes them as a soft-delete (is_deleted flag) and a periodic compaction removes the rows that have been soft-deleted past retention.

---

### Walk the requirements

#### Step 1: Hourly windows, every hour, with zero load on the OLTP

An orchestrator schedules the hourly DAG: CDC capture, transform, atomic swap, quality check. Log-based CDC reads from the binlog/WAL while the OLTP keeps doing application work. The orchestrator's sensor fires before the hour boundary if anything is at risk; on-call has minutes, not seconds. Without a warehouse tier the analytics target has nowhere to live; without CDC the hourly load is polling that operations rejects.

#### Step 2: Atomic hour swap so dashboards never see a half-loaded hour

Each hour's data builds in a staging table. When the build is complete and the quality checks pass, a single atomic operation swaps it into the warehouse: a partition-overwrite, a transactional table rename, or an Iceberg/Delta commit. Consumers see either the prior hour or the new hour, not a mix. A row-by-row append is the version where dashboards run mid-load and report inconsistent numbers.

#### Step 3: Idempotent replay on a stable change id

Every change carries the LSN/binlog position from the source. Warehouse writes are idempotent on that key (upsert or partition-overwrite keyed on the hour and the change id). A rerun of an hour writes the same final state regardless of where the prior run failed; the staging-and-swap pattern means the partial run's output is in staging, never in the warehouse. After an outage, replay is restart-from-LSN; the warehouse converges.

#### Step 4: Deletes propagate as events, soft-deleted in the warehouse

Source deletes (both hard and soft) emit DELETE events on the CDC stream. The warehouse writes them as a soft-delete: an is_deleted flag and a deleted_at timestamp on the warehouse row. Consumers filter out is_deleted rows by default. A periodic compaction removes the soft-deleted rows that have aged past the GDPR retention window. A 'just delete the row' approach loses the audit trail; a 'never propagate the delete' approach fails the GDPR review.

---

### The shape that fits

> **What this design gives up**
>
> Log-based CDC plus atomic swaps plus replayable idempotency plus soft-deletes is more pieces than 'hourly SELECT and append.' The staging-then-swap pattern roughly doubles staging storage during the build window. Operational complexity is the cost; the win is an OLTP that doesn't carry analytics load, dashboards that never see half an hour, reruns that don't double-count, and deletes that survive a GDPR audit.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - A change-data-capture path off the production log replicates without query load.
> - Each hour's load builds in a staging table and atomically swaps into the warehouse so consumers see either the prior hour or the new one.

> **The mistake that ships**
>
> The shape that ships polls the OLTP every hour with SELECTs and appends to the warehouse. Operations notices the load and pushes back. Dashboards running mid-load see some tables updated and others not, and analysts report inconsistent numbers. An outage causes a partial run; the rerun appends and double-counts. A customer is deleted at the source and remains in the warehouse, GDPR review takes a finding. The team rebuilds with log-based CDC, atomic swap, idempotent replay, and propagated soft-deletes. The GDPR finding sticks; analysts have spent the meantime reconciling against a warehouse they can't trust.

---

## Common follow-up questions

- An outage takes the pipeline down for several hours. What does this design do when the consumer comes back, and what does the warehouse look like in the meantime? _(Tests whether the candidate sees that the warehouse stays at the last successfully-swapped hour during the outage, the change stream retains messages within retention, the consumer replays from its last LSN, and the missed hours rebuild and swap in order. Consumers see the warehouse age, then catch up.)_
- A delete event arrives for a customer who was already soft-deleted last week. What does this design do, and what should the warehouse return for that customer now? _(Tests whether the candidate sees soft-delete as idempotent: the second delete updates the deleted_at timestamp (or no-op), and the warehouse returns no rows for that customer either way. The compaction job ages out the soft-deleted row past retention.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/sixty_minutes_every_hour)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.