Tooling Decision Guide

dbt vs Airflow, the practitioner's decision

dbt and Airflow are not competitors. They sit at different layers of the stack and the right pair is more important than the better tool. This guide pushes past the surface and into the integration patterns, cost math, and interview questions that actually matter.
The Short Answer
dbt is for SQL transformations inside the warehouse. Airflow is for orchestration across everything else. The 2026 default is Airflow or Dagster as the orchestrator with dbt as the warehouse modeling layer. Pick dbt alone only if your pipeline never leaves the warehouse. Pick Airflow alone only if you have nothing to model in SQL.
Updated May 2026·By The DataDriven Team
What this guide actually says
  1. 01dbt is a transformation framework that schedules itself badly. Airflow is a scheduler that does not transform. Treat them as orthogonal, not competing.
  2. 02If you only ever transform data already in your warehouse, you might not need Airflow at all. dbt Cloud or a cron entry can run dbt build on a schedule.
  3. 03If you do not have dbt, you are probably writing Airflow tasks for things SQL should do. Jinja in PythonOperator is a well-known regret.
  4. 04The standard 2026 production pattern is Airflow (or Dagster) handles the WHEN and HOW. dbt handles the WHAT. The handoff is a single trigger task per run.
  5. 05dbt tests cover schema-shaped drift. They do not cover volumetrics, freshness against external sources, or cross source contracts. You still need elementary, re_data, or a homegrown SLA layer.
  6. 06Picking Dagster over either is a real option in 2026. It solves asset-graph ergonomics that both Airflow and dbt work around. It also adds a learning curve and a smaller operator ecosystem. Budget for both.
Mental model

The orthogonality that fixes the question

dbt and Airflow do not compete because they answer different questions. dbt answers WHAT to compute. Airflow answers WHEN and HOW to run things. Once you internalize that, the rest of the decisions get simpler.

dbt is a compile time SQL graph plus materializations plus tests plus lineage. It runs anywhere SQL runs. The unit of value is the model, which is just a SELECT statement that dbt wraps in CREATE TABLE AS or MERGE INTO depending on materialization. The graph is built from ref() and source() calls at parse time. The scheduler is whatever invokes dbt build.

Airflow is an imperative DAG of arbitrary Python tasks with operators, sensors, retries, SLAs, scheduling, and a worker pool. The unit of value is the task. The graph is built from explicit dependencies between tasks at DAG definition time. There is no native notion of data, only of work to do.

The orthogonality is the punchline. dbt is the WHAT. Airflow is the WHEN and HOW. The two play together because dbt's WHAT becomes one task in Airflow's WHEN and HOW. If your interview answer does not separate those two layers, the interviewer already knows you have not run a real platform.

ConceptdbtAirflow
Primary artifactCompiled SQL graph (manifest.json)DAG of Python tasks
Where logic runsInside the warehouse, as SQL the warehouse executesOn Airflow workers, calling whatever you tell it to call
Compute cost driverWarehouse creditsWorker pool plus warehouse credits for whatever it triggers
Native schedulingCrude. dbt Cloud has a cron UI. dbt Core has nothing.First-class. Cron, data intervals, sensors, SLA misses, retries.
Failure modelModel fails, downstream skips, manifest records last_run_statusTask fails, retry policy fires, downstream stays queued
Branching and conditional logicRef graph only. No conditional fan-out.Native via BranchPythonOperator and TaskGroups
Sensor and external wait patternsNone native. Use external_source freshness as a hack.Native. S3KeySensor, ExternalTaskSensor, custom sensors.
LineageAuto generated from ref() and source(). dbt docs renders it.Task level only. No column lineage. OpenLineage adds some.
TestsSchema tests on data, generic and singular tests, freshness on sourcesWorkflow tests on DAGs. No data tests without extra tooling.
Multi tool fan inOne target warehouse per project. Multi project orchestration is hard.Trivially. Operators for everything, including multi cloud.
When the simple answer is correct

When you only need dbt

A surprising number of analytics platforms do not need an orchestrator. If you check most of these boxes, defer Airflow.

Signals that dbt alone is enough

One missed signal is fine. Two should make you double check. Three should make you wonder why Airflow is on the roadmap.

  • Single warehouse. Snowflake, BigQuery, Redshift, Databricks SQL, or Postgres. No fan out across non SQL compute.
  • Ingestion is handled by a managed ELT vendor (Fivetran, Airbyte, Stitch). dbt only sees data that already landed.
  • No Python pre processing in the pipeline. No ML training loop coupled to fresh data. No external API calls in the transformation path.
  • A daily or hourly cron is enough. You do not need data interval awareness, no SLA dashboards, no per task retry policy.
  • Team size under 10 engineers. Coordination cost of running Airflow exceeds the cost of dbt Cloud or a cron entry.
  • No multi tenant warehouse fan out. One project, one set of targets, one schedule.
Watch out
The path from dbt alone to dbt plus Airflow is rarely graceful. Teams that defer orchestration too long end up bolting on cron, GitHub Actions, and Lambda triggers until the pile collapses. Plan the migration before you need it, not during the incident.
When the simple answer breaks

When dbt alone falls apart

The signals that you have outgrown dbt's native scheduling. None of these are in isolation a reason to add Airflow. Two or three together usually are.

Signals you need orchestration
  • You need to land data first. Pulling from APIs, decrypting files, normalizing PII, or staging from S3 with custom parsing. dbt does not do any of that.
  • Fan out across non warehouse compute. Spark, Flink, EMR, Glue, Beam. dbt does not run there. Airflow operators do.
  • Conditional branches. Run the heavy backfill only on Sundays. Skip the marketing models when GA4 export missed. dbt has no native conditional execution.
  • Multi warehouse fan in. Snowflake plus Postgres plus BigQuery in the same pipeline. dbt projects are single target.
  • ML pipelines. Feature freshness, training trigger, model registry coupling. dbt models cannot promote a model artifact.
  • Cross tool sensor logic. Wait for the Salesforce export. Wait for the upstream dbt project run in another team. Wait for an SLA window to open.
  • Complex SLAs. SLA miss alerting, on call paging, SLO budget burn alerts. Airflow has SLA semantics. dbt does not.
Airflow without dbt is engineers reinventing materialization. dbt without orchestration is analysts ignoring the rest of the stack. Pick the right pair, not the better tool.
The DataDriven Team
Integration patterns

The five ways teams actually wire dbt and Airflow

Each pattern comes with a real trade off. Picking the right one depends on your model count, your team shape, and your tolerance for cold starts.

  1. 01

    Airflow plus dbt operator (most common)

    Airflow ingests data via custom operators or managed ELT, then triggers a single BashOperator running dbt build. Simple, fast to wire, easy to reason about. The trade off is that one Airflow task fans out into dozens of dbt models, so model level retries are not visible in the Airflow UI. A single failing model fails the whole task.

    Use this when you want minimum moving parts and your team already debugs dbt logs natively. It is the default for shops that adopted dbt before Cosmos existed.

  2. 02

    Airflow KubernetesPodOperator plus dbt

    dbt runs in a fresh pod per task. Cleanest dependency isolation. The dbt image carries its own version of dbt core, profiles, and Python deps. Airflow workers do not need any dbt dependency installed. This pattern scales across teams that share an Airflow cluster but each own their dbt project.

    Cost is cold start. Pod spin up adds 10 to 40 seconds per task. For pipelines with 30 plus dbt invocations, that is real. Mitigate with image pinning, warm pools, or batched dbt build per project rather than per model.

  3. 03

    dbt Cloud plus Airflow webhook

    You bought dbt Cloud for the IDE, the CI hooks, and the job UI. Airflow now has to coordinate dbt Cloud jobs alongside ingestion. Use the dbt Cloud trigger operator (or a thin webhook task) to start a dbt Cloud job, then poll for completion via the dbt Cloud sensor.

    The catch. Two scheduler UIs, two billing models, two places to look when a run is late. It works, but only if someone owns the question of which scheduler is the source of truth.

  4. 04

    Dagster instead of Airflow

    Dagster treats every dbt model as a software defined asset. The dbt graph and the Dagster graph merge. You get asset level retries, asset level lineage, asset level freshness policies. The dbt run is no longer a black box inside one task.

    The trade off is the operator ecosystem. Dagster has fewer connectors than Airflow. Migrating from Airflow to Dagster is a quarter of work, not a sprint. Greenfield analytics platforms in 2026 should evaluate Dagster before defaulting to Airflow.

  5. 05

    Airflow plus Astronomer Cosmos

    Cosmos parses the dbt manifest at DAG render time and emits one Airflow task per dbt model. You get model level UI in Airflow, model level retries, and the Airflow task tree mirrors the dbt graph. Best of both worlds for shops standardized on Airflow.

    Watch out for two things. DAG render time grows with model count. Cosmos pre 1.5 stalled at 800 plus models. And every dbt model becomes an Airflow task, which can inflate the Airflow metadata DB faster than expected. Plan retention.

Cost and latency

The 2026 cost math, honestly

Approximate monthly cost for a mid size pipeline. About 50 dbt models, 30 ingestion DAGs, daily plus hourly runs. Warehouse cost excluded except where noted.

OptionMid size monthly costOps burdenScheduler latencyWhere it breaks
dbt Cloud (Team)~$100 per developer per month, plus warehouseLowest. dbt Cloud handles UI, scheduling, IDE, CI.Run starts within a minute of cronWhen you outgrow its native scheduler. Branching, fan out, cross tool waits.
dbt core on Airflow EC2~$300 to 600 in EC2 plus self managed RDSHighest. You own scheduler, workers, metadata DB, upgrades.Run starts at the next data interval boundaryFirst Airflow CVE. First worker pool exhaustion. First metadata DB lock.
Astronomer Astro (managed Airflow)~$1,200 to 2,500 for a 2 to 4 worker deployment plus warehouseMedium. They run Airflow. You write DAGs.Same as AirflowPer task pricing creeps. Cosmos at high model counts blows DAG render budgets.
AWS MWAA~$700 to 1,400 for a small environment plus warehouseMedium. AWS runs scheduler and workers. You manage requirements.txt.Same as Airflow, with a measurable cold scheduler delayPlugin compatibility. MWAA pins versions. Some Airflow providers lag.
Dagster Cloud (Pro)~$10 per asset materialization, mid size pipelines land $400 to 1,200Low. They run the orchestrator. You write assets.Asset materialization triggers in seconds via sensorsSmaller operator ecosystem. Custom integrations more often DIY.
Cost gotcha
Astronomer per task pricing is the most common surprise in 2026. Cosmos rendered DAGs explode the task count because every dbt model becomes an Airflow task. Audit the bill the first month after enabling Cosmos.
Folklore audit

Myth versus reality

Five claims that get repeated in Twitter threads and Reddit comments. Each one falls apart on contact with a real platform.

The Myth
dbt replaces Airflow.
The Reality
dbt schedules itself like a cron. The moment your pipeline needs to wait, fan out, branch, page on call, or coordinate non SQL work, you need an orchestrator. dbt is a fantastic transformation tool that does not orchestrate. Stop trying to make it.
The Myth
Airflow can do dbt's job with PythonOperator and Jinja.
The Reality
Yes, and you will regret it within a quarter. You will rebuild ref(), incremental materialization, snapshots, and tests by hand. You will lose lineage. Your analytics engineers will quit because the merge conflict surface is someone else's Python code.
The Myth
dbt tests equal data quality covered.
The Reality
dbt tests catch null, unique, accepted values, and relationships. They do not catch volumetric drift, semantic regressions, freshness against external systems, or cross source contract breaks. You still need elementary, re_data, Monte Carlo, or a homegrown freshness layer.
The Myth
The Airflow scheduler is the bottleneck at scale.
The Reality
At most shops the bottleneck is the worker pool config or SLA misconfigurations. The 2.x scheduler is fast. If your DAG runs are queueing, the answer is almost always more worker slots, parallelism tuning, or splitting the metadata DB, not a different orchestrator.
The Myth
Just use Dagster.
The Reality
Dagster solves real ergonomic problems around assets and dbt integration. It also adds a learning curve your team must absorb. Connector breadth is smaller than Airflow. Hiring familiarity is smaller. Do not migrate without budgeting for both.
Decision matrix

If your situation is X, pick Y

A skimmable table of the eight most common situations. Use it as a starting point, then push on the trade offs.

If your situation is
Pick
Why
Single Snowflake warehouse, only SQL transformations, under 10 engineers
dbt core alone
Cron or dbt Cloud is enough. Airflow is overkill.
Multiple data sources, Python ingestion, complex retries
Airflow plus dbt
Airflow handles fan in and retries. dbt owns the warehouse layer.
ML training pipelines coupled to feature freshness
Airflow
Sensors, conditional branches, and model registry hooks live in Airflow.
Want a single tool for ingestion plus transformation plus assets
Dagster
Asset based mental model unifies dbt, Python, and Spark in one graph.
Already on Airflow, looking at adding dbt
Add dbt with Cosmos or BashOperator
Lowest migration cost. Cosmos when you want model level UI.
Greenfield analytics stack at a startup
dbt plus dbt Cloud, defer Airflow
Postpone orchestration complexity until you have non SQL work.
Multi cloud, Kafka in the loop, 50 plus DAGs
Self managed or MSK plus Airflow plus dbt
Operator breadth and multi cloud are Airflow strengths.
Need column level lineage across the stack
Dagster or dbt plus OpenLineage on Airflow
Airflow alone is task lineage. Column lineage requires either Dagster assets or OpenLineage.
Adjacency

Where data modeling enters the picture

dbt is also where most teams encode their dimensional model. The choice between SCD Type 1, Type 2, and snapshots lives inside dbt and ripples through Airflow scheduling.

dbt snapshots are the canonical way to track slowly changing dimensions in dbt. The snapshot block writes a valid_from and valid_to to a target table whenever a check or timestamp strategy detects a change. dbt handles the merge logic, the surrogate key, and the valid window maintenance.

The Airflow side is the schedule. Snapshots almost always run on a different cadence than transforms. Snapshots run every hour or every six hours to capture change. Transforms run nightly to roll up. Mixing the two on the same dbt invocation hides the cadence mismatch and makes failure modes harder to reason about.

The interview question that exposes whether you have run this in production is volumetrics. How big does the snapshot table get. How do you prune. How do you handle backfills. How do you reconcile a snapshot that started with bad data. None of those questions have a dbt only answer. They all involve the orchestrator.

Data ModelingPractice the SCD Type 2 model
The Customer Who Changed

She moved. She upgraded. She became someone new. The record has to keep up.

Interview signal

What interviewers actually grade on

They do not ask which is better. They ask scenarios. Five questions that come up in real loops, with what a strong answer looks like.

Q01

Walk me through your orchestration topology for an analytics platform serving 50 dashboards.

Strong answers name the layers explicitly. ELT vendor lands raw. Airflow or Dagster orchestrates. dbt models stage, intermediate, and mart layers. BI tool reads from marts. Then they explain who owns what. Where freshness alerts live. How a failed source flows downstream. Where retry logic sits. Bonus points for naming the SLA story per dashboard tier.
Q02

Your dbt model takes 4 hours and blocks downstream models. How do you fix it?

First instinct should not be to throw warehouse compute at it. Read the explain plan. Check for full table scans on the source. Convert to incremental with a high cardinality unique_key. Look for cross joins from accidental fanout. Partition or cluster the source. Last resort, materialize an intermediate stage. Naming the diagnostic order is what interviewers grade.
Q03

How would you migrate from Airflow plus custom SQL to Airflow plus dbt without freezing reporting?

Inventory the SQL. Identify the marts that drive the top 20 dashboards. Build dbt models in parallel as materialized=view first, validating against the existing tables row for row. Cut over by repointing the dashboards, not by deleting the old tables. Keep the old DAG running in shadow for two weeks. Decommission only after a clean backfill comparison.
Q04

What does a late arriving event do to your dbt incremental models? How do you reconcile?

The is_incremental() filter on event_ts misses the late row. The fix depends on tolerance. For low volume late events, run a daily reconciliation model that re processes a 7 day rolling window with a merge strategy. For high volume, route late events to a separate table and union at query time. Discuss the trade off between completeness and cost.
Q05

When would you reach for Dagster instead of either?

When the asset graph is the natural unit of work. When dbt models, Spark jobs, and Python feature pipelines all need to live in one lineage view. When you want freshness policies on assets, not schedules on DAGs. When you are starting fresh and your team can absorb the learning curve. Honest answer includes the trade off, not just the sales pitch.

Frequently asked questions

Are dbt and Airflow really not competitors?+
Correct. dbt is a SQL transformation framework that compiles to your warehouse. Airflow is a general orchestrator. They sit at different layers. The vs framing is a search query artifact, not a real architectural choice. Most production teams run both.
Should I learn dbt or Airflow first?+
If you are targeting analytics engineering, dbt first. The IDE, the ref graph, the test framework, and the documentation flow are exactly what the role centers on. If you are targeting platform or data engineering, Airflow first. Orchestration semantics, retry logic, and operator ecosystems are what those interviews test.
Is dbt Core or dbt Cloud the right choice?+
dbt Core is the open source CLI. It is sufficient for most teams that already have Airflow or another orchestrator. dbt Cloud adds a hosted UI, a scheduler, an IDE, and CI integrations. The trade off is per developer pricing. Worth it for teams that do not want to operate Airflow and dbt themselves.
Should I migrate from Airflow to Dagster?+
If you are starting fresh and your team can absorb a new framework, evaluate Dagster. Asset based modeling fits dbt cleanly. If you have an existing Airflow deployment with hundreds of DAGs, the migration cost rarely justifies the move. Both are production grade. Dagster ergonomics around assets are real but not free.
What is the difference between dbt and Spark SQL?+
Spark SQL is a SQL execution engine inside Spark. dbt is a workflow framework that compiles SQL and ships it to a target. dbt can target Spark via dbt-spark, but more commonly targets Snowflake, BigQuery, Redshift, Databricks SQL, or Postgres. They sit at different layers.
Can I use Airflow without dbt?+
Yes. Many production deployments do. Airflow with PythonOperator, SparkSubmitOperator, and connector operators handles full orchestration. dbt is added when SQL modeling becomes a meaningful slice of the workload and analytics engineers need a native authoring environment.
Can I use dbt without Airflow?+
Yes. dbt Cloud has a built in scheduler. Cron works. GitHub Actions works. Airflow becomes valuable when you need to coordinate dbt with non dbt tasks like ingestion, ML, or external APIs.
How important is Airflow knowledge for a data engineering interview?+
Important. Airflow appears in roughly 65 percent of data engineer interview loops in our dataset. You should be able to design a DAG, discuss operators, explain retry semantics, and reason about idempotency in scheduled tasks. Even if your target shop uses Dagster or Prefect, the concepts overlap.
Does Cosmos solve the dbt model visibility problem in Airflow?+
Mostly. Cosmos renders each dbt model as a separate Airflow task, so the Airflow UI shows model level status and retries. The trade off is DAG render time and metadata DB inflation. At 800 plus models, Cosmos can stall DAG parsing. Plan accordingly.
What about Prefect?+
Prefect is a credible alternative to Airflow with a hybrid execution model and dynamic flow definitions. It has a smaller user base than Airflow and a smaller asset story than Dagster. Best fit for teams that need workflow shape to change per run. Less common in interview questions, but mentioned in the Airflow vs Dagster vs Prefect tradeoff.

Drill the patterns interviewers actually test

DataDriven covers SQL, Python, system design, and data modeling at interview difficulty. Run them against real schemas in the browser.

More data engineer interview prep guides

Continue your prep

Data Engineer Interview Prep, explore the full guide

50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 921 companies, collected from real candidates.

Interview Rounds

By Company

By Role

By Technology

Decisions

Question Formats