Loading section...

How Do You Backfill?

Concepts: paBackfill, paDagOrchestration

At scale, backfill is not an emergency operation - it is a first-class pipeline operation with its own scheduling, budgeting, and monitoring. The real question is not 'can you backfill?' but 'can you backfill 400 TB within a $2,000 compute budget without disrupting production workloads?' Your answer should demonstrate that you think about backfill as an engineering problem, not a fire drill. Petabyte-Scale Backfill Design A naive backfill of a 400 TB table reprocesses everything at once - saturating your warehouse cluster for days, blocking production queries, and costing a small fortune. The progressive backfill pattern breaks the work into small, independently-schedulable chunks that execute during off-peak hours over days or weeks. This is the architecture the interviewer wants to hear.