Loading lesson...
Scale answers: dynamic DAGs, circuit breakers, canary deploys
What They Want to Hear 'I use a DAG factory: one Python function that generates DAGs from YAML or JSON configuration. Each pipeline is a config file specifying source, destination, schedule, and SLA. The factory reads all configs and generates a DAG for each. Adding a new pipeline means adding a config file, not writing code.'
What They Want to Hear 'A circuit breaker stops retrying after a failure threshold. Three states: closed (normal operation), open (tripped, all requests fail fast without executing), half-open (probing, let one request through to test recovery). This prevents a failing upstream from generating retry storms that overwhelm the system.'
What They Want to Hear 'Time-based scheduling (run at 7 AM) fails when upstream data is late. Data-aware scheduling (run when the input partition is populated) adapts automatically. Airflow datasets (v2.4+) let a producer declare that it updated a dataset, and consumer DAGs trigger automatically. No polling, no hardcoded schedules.'
What They Want to Hear 'Run v1 (current) and v2 (new) in parallel on a data subset. Compare outputs. If identical, promote v2 to full production. If they differ, investigate before rolling out. This catches bugs that unit and integration tests miss: subtle logic changes that produce slightly different numbers.'
What They Want to Hear 'Four levels: static analysis (seconds, lint + type check), unit tests (minutes, test individual transforms), integration tests (30 min to 1 hour, end-to-end with real data), data diff (hours, compare production outputs). The most important single test is schema compatibility: assert that the output schema has not changed unexpectedly. Schema breaks are the #1 cause of production data incidents.'