Loading lesson...
Orchestration and Dependencies: Intermediate
Schedules trigger DAGs, runs are the work, and dependencies cross every boundary
Schedules trigger DAGs, runs are the work, and dependencies cross every boundary
- Category
- Pipeline Architecture
- Difficulty
- intermediate
- Duration
- 30 minutes
- Challenges
- 0 hands-on challenges
Topics covered: Schedules: Cron, Interval, Event, Task vs DAG vs Run for Retries, Cross-DAG Dependencies, Sensors and External Triggers, Three Cadences, One Output
Lesson Sections
- Schedules: Cron, Interval, Event (concepts: paScheduleTypes, paCatchupMode)
Every DAG has a schedule. The schedule decides when a run starts. Three forms cover almost every production case: cron expressions for repeated wall-clock times, intervals for relative cadences (every fifteen minutes, every hour), and event triggers for runs that start when something external happens. A modern orchestrator supports all three, and the right choice depends on the source's behavior, not on engineer preference. Cron Expressions A cron expression is five fields that name a recurring
- Task vs DAG vs Run for Retries (concepts: paTaskDagRun, paLogicalDate)
Three words appear in every conversation about orchestration: task, DAG, and run. New engineers use the three interchangeably, and most of the time the imprecision does not bite. It bites hard when retry semantics are at stake, because the orchestrator retries at exactly one of those three levels and the answer matters. The vocabulary below is precise on purpose. The Three Words Retries Happen at the Task Instance Level When an orchestrator retries a failure, it retries a task instance. It does
- Cross-DAG Dependencies (concepts: paCrossDagDependency, paAssetTrigger)
Real production environments do not run one giant DAG. They run dozens of smaller DAGs, owned by different teams, on different cadences. Some of those DAGs depend on each other. The marketing analytics DAG reads tables produced by the orders DAG; the ML feature DAG reads tables produced by the events DAG. The dependency edge crosses a DAG boundary. Modeling that edge correctly is the difference between a system that scales across teams and one that breaks every time someone changes a schedule. T
- Sensors and External Triggers (concepts: paSensors, paExternalTriggers)
A pipeline often has to wait for something outside its control. A vendor SFTP drops a file at an irregular time. A REST API publishes a daily endpoint that becomes available between 1am and 4am. A Kafka topic accumulates messages, and a downstream batch process should kick off when the offset crosses a threshold. Sensors are the orchestration primitive that turns 'wait for the world' into a task the DAG can schedule around. Knowing the families of sensors and their trade-offs is part of the inte
- Three Cadences, One Output (concepts: paMultiCadenceOrchestration, paFreshnessJoin)
A real orchestration design rarely has one schedule. The example below builds a single downstream table that is fed by three sources, each on its own cadence. The shape is common in production: a daily executive table that combines streaming events, hourly Stripe data, and once-a-day Salesforce CRM. Designing this correctly requires every concept from the previous sections. The Sources and Their Cadences The Downstream Table The downstream is mart.daily_revenue_by_account, a single table that th