# A daily orders pipeline runs its heavy 12-hour SQL aggregation inside the Airflow scheduler itself,

Canonical URL: <https://datadriven.io/problems/a-daily-orders-pipeline-runs-its-heavy-12-hour-sql-aggregati-70b34f42>

Domain: Pipeline Design · Difficulty: medium

## Problem

A daily orders pipeline runs its heavy 12-hour SQL aggregation inside the Airflow scheduler itself, using a PythonOperator that executes the SQL in-process. The aggregation is starving the scheduler: other DAGs sit waiting, the orchestrator UI lags, and a single slow task degrades visibility for every other pipeline on the same Airflow instance. This section is explicit that the orchestrator owns four responsibilities (scheduling, dependency resolution, retries, visibility) and delegates the actual transform work (the section names a Snowflake warehouse, a Spark cluster, or a Python container as the worker categories). Replace the in-process Airflow PythonOperator transform with a delegated worker transform whose name states what aggregation it runs and whose tech_label is one of the section's worker categories: Snowflake, BigQuery, Spark, PySpark, Databricks, or Python. Wire the Postgres source into the new worker transform and the new worker transform into the Snowflake daily_orders mart; the Morning dashboard reads from the mart. Keep the Airflow orchestrator node on the canvas; it continues to own when the work runs, the order it runs in, the per-task retry policy, and the on-call UI.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/a-daily-orders-pipeline-runs-its-heavy-12-hour-sql-aggregati-70b34f42)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.