Loading section...

How Do You Backfill?

Concepts: paBackfill, paDagOrchestration

Backfill is the question that exposes whether your pipelines are production-grade or demo-ware. A new column is added, a bug corrupted three weeks of data, a new downstream model needs historical features - all of these require reprocessing historical data through a pipeline designed to only look forward. The interviewer wants to hear that you design for backfill from day one, not bolt it on after the first incident. Partition-Level Backfill Your answer should start here: partition-level backfill. Instead of reprocessing the entire table (expensive, risky) or individual rows (complex, slow), you reprocess entire date partitions. This gives you a natural unit of work that is independently verifiable and restartable. The framework: "I delete the target partition, re-extract from source, and