dbt vs Airflow, the practitioner's decision
- 01dbt is a transformation framework that schedules itself badly. Airflow is a scheduler that does not transform. Treat them as orthogonal, not competing.
- 02If you only ever transform data already in your warehouse, you might not need Airflow at all. dbt Cloud or a cron entry can run dbt build on a schedule.
- 03If you do not have dbt, you are probably writing Airflow tasks for things SQL should do. Jinja in PythonOperator is a well-known regret.
- 04The standard 2026 production pattern is Airflow (or Dagster) handles the WHEN and HOW. dbt handles the WHAT. The handoff is a single trigger task per run.
- 05dbt tests cover schema-shaped drift. They do not cover volumetrics, freshness against external sources, or cross source contracts. You still need elementary, re_data, or a homegrown SLA layer.
- 06Picking Dagster over either is a real option in 2026. It solves asset-graph ergonomics that both Airflow and dbt work around. It also adds a learning curve and a smaller operator ecosystem. Budget for both.
The orthogonality that fixes the question
dbt and Airflow do not compete because they answer different questions. dbt answers WHAT to compute. Airflow answers WHEN and HOW to run things. Once you internalize that, the rest of the decisions get simpler.
dbt is a compile time SQL graph plus materializations plus tests plus lineage. It runs anywhere SQL runs. The unit of value is the model, which is just a SELECT statement that dbt wraps in CREATE TABLE AS or MERGE INTO depending on materialization. The graph is built from ref() and source() calls at parse time. The scheduler is whatever invokes dbt build.
Airflow is an imperative DAG of arbitrary Python tasks with operators, sensors, retries, SLAs, scheduling, and a worker pool. The unit of value is the task. The graph is built from explicit dependencies between tasks at DAG definition time. There is no native notion of data, only of work to do.
The orthogonality is the punchline. dbt is the WHAT. Airflow is the WHEN and HOW. The two play together because dbt's WHAT becomes one task in Airflow's WHEN and HOW. If your interview answer does not separate those two layers, the interviewer already knows you have not run a real platform.
| Concept | dbt | Airflow |
|---|---|---|
| Primary artifact | Compiled SQL graph (manifest.json) | DAG of Python tasks |
| Where logic runs | Inside the warehouse, as SQL the warehouse executes | On Airflow workers, calling whatever you tell it to call |
| Compute cost driver | Warehouse credits | Worker pool plus warehouse credits for whatever it triggers |
| Native scheduling | Crude. dbt Cloud has a cron UI. dbt Core has nothing. | First-class. Cron, data intervals, sensors, SLA misses, retries. |
| Failure model | Model fails, downstream skips, manifest records last_run_status | Task fails, retry policy fires, downstream stays queued |
| Branching and conditional logic | Ref graph only. No conditional fan-out. | Native via BranchPythonOperator and TaskGroups |
| Sensor and external wait patterns | None native. Use external_source freshness as a hack. | Native. S3KeySensor, ExternalTaskSensor, custom sensors. |
| Lineage | Auto generated from ref() and source(). dbt docs renders it. | Task level only. No column lineage. OpenLineage adds some. |
| Tests | Schema tests on data, generic and singular tests, freshness on sources | Workflow tests on DAGs. No data tests without extra tooling. |
| Multi tool fan in | One target warehouse per project. Multi project orchestration is hard. | Trivially. Operators for everything, including multi cloud. |
When you only need dbt
A surprising number of analytics platforms do not need an orchestrator. If you check most of these boxes, defer Airflow.
One missed signal is fine. Two should make you double check. Three should make you wonder why Airflow is on the roadmap.
- Single warehouse. Snowflake, BigQuery, Redshift, Databricks SQL, or Postgres. No fan out across non SQL compute.
- Ingestion is handled by a managed ELT vendor (Fivetran, Airbyte, Stitch). dbt only sees data that already landed.
- No Python pre processing in the pipeline. No ML training loop coupled to fresh data. No external API calls in the transformation path.
- A daily or hourly cron is enough. You do not need data interval awareness, no SLA dashboards, no per task retry policy.
- Team size under 10 engineers. Coordination cost of running Airflow exceeds the cost of dbt Cloud or a cron entry.
- No multi tenant warehouse fan out. One project, one set of targets, one schedule.
When dbt alone falls apart
The signals that you have outgrown dbt's native scheduling. None of these are in isolation a reason to add Airflow. Two or three together usually are.
- You need to land data first. Pulling from APIs, decrypting files, normalizing PII, or staging from S3 with custom parsing. dbt does not do any of that.
- Fan out across non warehouse compute. Spark, Flink, EMR, Glue, Beam. dbt does not run there. Airflow operators do.
- Conditional branches. Run the heavy backfill only on Sundays. Skip the marketing models when GA4 export missed. dbt has no native conditional execution.
- Multi warehouse fan in. Snowflake plus Postgres plus BigQuery in the same pipeline. dbt projects are single target.
- ML pipelines. Feature freshness, training trigger, model registry coupling. dbt models cannot promote a model artifact.
- Cross tool sensor logic. Wait for the Salesforce export. Wait for the upstream dbt project run in another team. Wait for an SLA window to open.
- Complex SLAs. SLA miss alerting, on call paging, SLO budget burn alerts. Airflow has SLA semantics. dbt does not.
“Airflow without dbt is engineers reinventing materialization. dbt without orchestration is analysts ignoring the rest of the stack. Pick the right pair, not the better tool.”
The five ways teams actually wire dbt and Airflow
Each pattern comes with a real trade off. Picking the right one depends on your model count, your team shape, and your tolerance for cold starts.
- 01
Airflow plus dbt operator (most common)
Airflow ingests data via custom operators or managed ELT, then triggers a single BashOperator running dbt build. Simple, fast to wire, easy to reason about. The trade off is that one Airflow task fans out into dozens of dbt models, so model level retries are not visible in the Airflow UI. A single failing model fails the whole task.
Use this when you want minimum moving parts and your team already debugs dbt logs natively. It is the default for shops that adopted dbt before Cosmos existed.
- 02
Airflow KubernetesPodOperator plus dbt
dbt runs in a fresh pod per task. Cleanest dependency isolation. The dbt image carries its own version of dbt core, profiles, and Python deps. Airflow workers do not need any dbt dependency installed. This pattern scales across teams that share an Airflow cluster but each own their dbt project.
Cost is cold start. Pod spin up adds 10 to 40 seconds per task. For pipelines with 30 plus dbt invocations, that is real. Mitigate with image pinning, warm pools, or batched dbt build per project rather than per model.
- 03
dbt Cloud plus Airflow webhook
You bought dbt Cloud for the IDE, the CI hooks, and the job UI. Airflow now has to coordinate dbt Cloud jobs alongside ingestion. Use the dbt Cloud trigger operator (or a thin webhook task) to start a dbt Cloud job, then poll for completion via the dbt Cloud sensor.
The catch. Two scheduler UIs, two billing models, two places to look when a run is late. It works, but only if someone owns the question of which scheduler is the source of truth.
- 04
Dagster instead of Airflow
Dagster treats every dbt model as a software defined asset. The dbt graph and the Dagster graph merge. You get asset level retries, asset level lineage, asset level freshness policies. The dbt run is no longer a black box inside one task.
The trade off is the operator ecosystem. Dagster has fewer connectors than Airflow. Migrating from Airflow to Dagster is a quarter of work, not a sprint. Greenfield analytics platforms in 2026 should evaluate Dagster before defaulting to Airflow.
- 05
Airflow plus Astronomer Cosmos
Cosmos parses the dbt manifest at DAG render time and emits one Airflow task per dbt model. You get model level UI in Airflow, model level retries, and the Airflow task tree mirrors the dbt graph. Best of both worlds for shops standardized on Airflow.
Watch out for two things. DAG render time grows with model count. Cosmos pre 1.5 stalled at 800 plus models. And every dbt model becomes an Airflow task, which can inflate the Airflow metadata DB faster than expected. Plan retention.
The 2026 cost math, honestly
Approximate monthly cost for a mid size pipeline. About 50 dbt models, 30 ingestion DAGs, daily plus hourly runs. Warehouse cost excluded except where noted.
| Option | Mid size monthly cost | Ops burden | Scheduler latency | Where it breaks |
|---|---|---|---|---|
| dbt Cloud (Team) | ~$100 per developer per month, plus warehouse | Lowest. dbt Cloud handles UI, scheduling, IDE, CI. | Run starts within a minute of cron | When you outgrow its native scheduler. Branching, fan out, cross tool waits. |
| dbt core on Airflow EC2 | ~$300 to 600 in EC2 plus self managed RDS | Highest. You own scheduler, workers, metadata DB, upgrades. | Run starts at the next data interval boundary | First Airflow CVE. First worker pool exhaustion. First metadata DB lock. |
| Astronomer Astro (managed Airflow) | ~$1,200 to 2,500 for a 2 to 4 worker deployment plus warehouse | Medium. They run Airflow. You write DAGs. | Same as Airflow | Per task pricing creeps. Cosmos at high model counts blows DAG render budgets. |
| AWS MWAA | ~$700 to 1,400 for a small environment plus warehouse | Medium. AWS runs scheduler and workers. You manage requirements.txt. | Same as Airflow, with a measurable cold scheduler delay | Plugin compatibility. MWAA pins versions. Some Airflow providers lag. |
| Dagster Cloud (Pro) | ~$10 per asset materialization, mid size pipelines land $400 to 1,200 | Low. They run the orchestrator. You write assets. | Asset materialization triggers in seconds via sensors | Smaller operator ecosystem. Custom integrations more often DIY. |
Myth versus reality
Five claims that get repeated in Twitter threads and Reddit comments. Each one falls apart on contact with a real platform.
If your situation is X, pick Y
A skimmable table of the eight most common situations. Use it as a starting point, then push on the trade offs.
Where data modeling enters the picture
dbt is also where most teams encode their dimensional model. The choice between SCD Type 1, Type 2, and snapshots lives inside dbt and ripples through Airflow scheduling.
dbt snapshots are the canonical way to track slowly changing dimensions in dbt. The snapshot block writes a valid_from and valid_to to a target table whenever a check or timestamp strategy detects a change. dbt handles the merge logic, the surrogate key, and the valid window maintenance.
The Airflow side is the schedule. Snapshots almost always run on a different cadence than transforms. Snapshots run every hour or every six hours to capture change. Transforms run nightly to roll up. Mixing the two on the same dbt invocation hides the cadence mismatch and makes failure modes harder to reason about.
The interview question that exposes whether you have run this in production is volumetrics. How big does the snapshot table get. How do you prune. How do you handle backfills. How do you reconcile a snapshot that started with bad data. None of those questions have a dbt only answer. They all involve the orchestrator.
She moved. She upgraded. She became someone new. The record has to keep up.
What interviewers actually grade on
They do not ask which is better. They ask scenarios. Five questions that come up in real loops, with what a strong answer looks like.
Walk me through your orchestration topology for an analytics platform serving 50 dashboards.
Your dbt model takes 4 hours and blocks downstream models. How do you fix it?
How would you migrate from Airflow plus custom SQL to Airflow plus dbt without freezing reporting?
What does a late arriving event do to your dbt incremental models? How do you reconcile?
When would you reach for Dagster instead of either?
Practice the orchestration patterns
Four real challenges that exercise the patterns this guide covers. Late arriving events, reconciliation across two sources, duplicate detection, and SCD Type 2.
Billions of clicks. One tiny code. Two very different clocks.
Two versions of the same truth.
Same email, different rows. Spot the repeats.
She moved. She upgraded. She became someone new. The record has to keep up.
Frequently asked questions
Are dbt and Airflow really not competitors?+
Should I learn dbt or Airflow first?+
Is dbt Core or dbt Cloud the right choice?+
Should I migrate from Airflow to Dagster?+
What is the difference between dbt and Spark SQL?+
Can I use Airflow without dbt?+
Can I use dbt without Airflow?+
How important is Airflow knowledge for a data engineering interview?+
Does Cosmos solve the dbt model visibility problem in Airflow?+
What about Prefect?+
Drill the patterns interviewers actually test
DataDriven covers SQL, Python, system design, and data modeling at interview difficulty. Run them against real schemas in the browser.
More data engineer interview prep guides
Data Engineer vs AE roles, daily work, comp, skills, and which to target.
Data Engineer vs MLE roles, where the boundary lives, comp differences, and how to switch.
Data Engineer vs backend roles, daily work, comp, interview differences, and crossover paths.
When SQL wins, when Python wins, and how Data Engineer roles use both.
Snowflake vs Databricks, interview differences, role differences, and how to choose.
Kafka vs Kinesis, throughput, cost, ops burden, and the Data Engineer interview implications.