Loading lesson...
Platform-level depth: coordination, blue-green, self-healing
What They Want to Hear 'Readiness signals. The producer publishes a signal (marker file, metadata update, Airflow dataset) when output is ready. The consumer waits for the signal, not for a fixed time. SLA buffers handle late producers: if the signal is not received by X time, the consumer alerts and uses stale data. This decouples teams: the consumer does not need to know anything about the producer's DAG structure.'
What They Want to Hear 'Code versioning is simple: git. Data versioning is harder: you need to know what the table looked like at a specific point in time. Table formats (Iceberg, Delta) provide time travel: query the table as of a specific timestamp or version number. For reprocessing after a logic change, replay from the last known good state using Bronze data and the new code.'
What They Want to Hear 'Blue (current prod) and green (new version) run in parallel on the same input data. Both write to separate output locations. Compare outputs. If they match, switch consumers from blue to green by updating a view or alias. If they differ, investigate before switching. Rollback = switch the view back to blue. Zero downtime because consumers always read from the view, and the view always points to valid data.'
What They Want to Hear 'Auto-recover from transient failures (network timeouts, temporary API errors) with retries and backoff. For late-arriving data, auto-extend the processing window. For schema drift, auto-detect and alert but do not auto-fix (schema changes need human judgment). For OOM (out of memory), auto-scale the compute. The rule: auto-recover from infrastructure failures, alert humans for data logic failures.'
What They Want to Hear 'I measure three dimensions: reliability (pipeline success rate > 99.5%), velocity (time from code commit to production < 1 hour), and cost efficiency (cost per pipeline-hour trending down). A mature platform has all three. Most start with reliability, then add velocity, then optimize cost.'