Tooling Decision Guide

dbt vs Airflow

dbt and Airflow are two of the most-used tools in modern data engineering, and the question "dbt vs Airflow" often confuses early-career engineers because the tools aren't actually competitors. They serve different parts of the stack: dbt is a SQL-first transformation framework that runs in your warehouse; Airflow is a general-purpose orchestrator that runs anywhere. Most modern data teams use both: Airflow orchestrates dbt runs alongside other tasks. This guide breaks down where each fits, how they integrate, and what interviewers test about each. Pair with the our data engineer interview prep hub.

The Short Answer
The short answer: dbt is for SQL-based modeling and transformation in your warehouse. Airflow is for orchestrating multi-step data workflows including ingestion, transformation, ML jobs, and external triggers. They're complementary, not competitors. The right architecture in 2026 is typically: Airflow (or modern alternatives like Dagster or Prefect) orchestrates dbt runs, Spark jobs, and external API calls; dbt handles the warehouse modeling layer. Pick dbt if you have a warehouse and need a modeling layer. Pick Airflow if you need orchestration. Use both for production data platforms.
Updated April 2026·By The DataDriven Team

Side-by-Side: dbt vs Airflow

The two tools serve different layers of the stack and complement each other in production.

DimensiondbtAirflow
Primary purposeSQL transformation in warehouseGeneral-purpose orchestration
LanguageSQL + Jinja templatingPython (DAG definition + tasks)
Where logic runsIn the warehouse (Snowflake, BigQuery, Redshift)Where you point it (Spark, Bash, Python, dbt CLI)
Best forTransformation, modeling, testingMulti-step workflows with external dependencies
LineageNative (auto-generated from refs)Task-level only
TestingBuilt-in tests on dataLimited; tests are workflow-level
DocumentationNative (dbt docs)Limited
SchedulingExternal (uses Airflow or dbt Cloud)Native scheduler
Operators / connectorsWarehouse-specificHundreds of operators for everything
Learning curveModerate (SQL-first, easy entry)Steep (Python framework + ops)
CostFree (dbt Core) or paid (dbt Cloud)Free (Apache project) or managed (MWAA, Astronomer)
Most-likely userAnalytics engineerData engineer + analytics engineer

What dbt Does Well

dbt is purpose-built for SQL-based transformation in a warehouse. Its strengths are: declarative model definition (CREATE TABLE AS SELECT pattern), automatic lineage from ref() and source() functions, built-in tests on data (unique, not_null, accepted_values, relationships, custom tests), native documentation with column-level descriptions, and a workflow that feels natural to anyone who already writes SQL.

dbt's sweet spot is the modeling layer of a modern data stack: source data lands in the warehouse (via Fivetran, Airbyte, or custom ingestion), dbt transforms it into clean staging, intermediate, and mart layers. The output is queryable by BI tools or downstream services.

What dbt doesn't do: ingestion (it operates on data already in the warehouse), Python transformations (dbt Python models exist but are limited), arbitrary workflow orchestration (it can chain models but not external systems), real-time streaming (dbt is batch-first).

What Airflow Does Well

Airflow is a general-purpose workflow orchestrator. Its strengths are: rich operator ecosystem (hundreds of operators for AWS, GCP, databases, APIs, custom code), Python-based DAG definition (programmable, version-controlled, testable), retry and failure handling at the task level, scheduling with cron- like expressions, sensor patterns for waiting on external events.

Airflow's sweet spot is multi-step pipelines with external dependencies: pull data from an API, land it in S3, trigger a Spark job, run dbt models on the resulting tables, push the output to a downstream consumer. Each step might use a different tool or run in a different environment.

What Airflow doesn't do: SQL transformation logic itself (Airflow runs the dbt CLI; the SQL lives in dbt), real-time streaming (Airflow is batch- oriented; for streaming use Flink, Spark Structured Streaming, or Kafka Streams), ad-hoc query execution.

How They Work Together: The Standard Pattern

Most production data teams in 2026 use both: Airflow as the orchestrator, dbt as the transformation framework inside Airflow. The pattern: Airflow DAG defines the end-to-end pipeline. Tasks include ingestion (custom or via Airflow operators), then a dbt run task that executes the relevant dbt models, then downstream tasks (export, notification, ML training).

The Airflow-dbt integration is mature in 2026. Astronomer's open-source Cosmos library renders each dbt model as a separate Airflow task, giving you task-level visibility, parallelization, and failure isolation. The alternative (a single BashOperator running "dbt run") is simpler but loses model-level granularity.

In 2026, modern alternatives like Dagster offer dbt integration as a first-class feature. Dagster + dbt is increasingly common at companies starting fresh because Dagster's asset-based model fits dbt more naturally than Airflow's task-based model. Both work; Airflow has more ecosystem maturity.

Decision Framework: Which Tool for Which Need

The honest decision rule: pick based on what you need to build, not based on which tool is more famous.

NeedToolReason
Define warehouse models with SQLdbtNative fit
Test data quality on warehouse tablesdbtBuilt-in tests
Auto-generate lineage from SQLdbtref() and source() build the graph
Schedule a daily Spark jobAirflowSparkSubmitOperator
Pull data from a REST APIAirflowHTTP operators
Coordinate dbt run with upstream ingestionAirflow + dbtAirflow orchestrates; dbt models
Trigger ML training after data arrivesAirflowMulti-step workflow
Stream Kafka events to a databaseNeither (use Flink or Spark Streaming)Both are batch-oriented
Run a backfill across 90 daysAirflow + dbtAirflow handles scheduling; dbt handles models
Document column-level meaningdbtNative dbt docs
Cron-like schedulingAirflowNative scheduler
Send Slack notification on pipeline failureAirflowSlackOperator
Convert raw events to gold tablesdbtModeling layer
Multi-cloud orchestrationAirflowCloud-agnostic

Six Real Interview Questions About dbt and Airflow

L4

When would you choose dbt over a custom Python transformation framework?

When the transformations are SQL-shaped: aggregations, joins, modeling layers, dimensional design. dbt handles SQL with version control, lineage, testing, and documentation built in. Custom Python is right when transformations require procedural logic (file parsing, custom algorithms, ML feature engineering) that doesn't fit cleanly in SQL. Most modern stacks use dbt for SQL transformations and Python (often via Airflow) for non-SQL work.
L4

Design an Airflow DAG that runs dbt models after S3 data lands

Sensor task waits for S3 file arrival (S3KeySensor). Once detected, BashOperator or KubernetesPodOperator runs "dbt run --models marts.daily_revenue". Followed by data quality check (run dbt tests via "dbt test --models marts.daily_revenue"). On failure, send Slack notification via SlackOperator. Discuss the alternative: Cosmos library rendering each dbt model as a separate Airflow task for finer visibility.
L5

Implement SCD Type 2 in dbt vs in a custom Spark job

dbt: use snapshots with check or timestamp strategy. dbt handles the merge logic, surrogate key generation, valid_from / valid_to maintenance. Custom Spark: write merge logic by hand using MERGE INTO (Delta / Iceberg) or by composing source vs target diff and applying. dbt is the better choice when the data lives in a warehouse; custom Spark is the better choice when working in Iceberg or Delta on a data lake.
L5

How would you set up incremental dbt models for a 5B-row fact table?

Materialization: incremental. unique_key on surrogate key. on_schema_change set to append_new_columns. is_incremental() block filters to event_ts > (SELECT max(event_ts) FROM {{ this }}). Late-arriving data older than 24 hours routed to a separate backfill model. Discuss: merge strategy for SCDs vs append + dedup for event logs.
L5

How do you handle a failed Airflow DAG mid-run?

Airflow's built-in retry policy at the task level. For idempotent tasks: simple exponential backoff retry. For non-idempotent tasks: check whether the partial work succeeded before retrying; use external state (a status table) to coordinate. For DAG-level recovery: Airflow's manual rerun from a specific task. For systematic failures: alert via PagerDuty, route to dead-letter queue, document the runbook.
L6

Design the data platform architecture: Airflow vs Dagster vs Prefect choice

Airflow: most mature ecosystem, largest community, largest operator library. Best when you need battle-tested production operators. Dagster: asset-based mental model that fits dbt more naturally, better for asset-driven data platforms. Best when starting fresh in a data-first context. Prefect: hybrid execution model, dynamic flows, good for workflows with significant branching. Best when workflow shape changes per run. Discuss trade-offs honestly: Airflow has the most community knowledge but the steepest learning curve; Dagster has cleaner abstractions but smaller ecosystem; Prefect has dynamic workflows but smaller user base.

How This Decision Connects to the Rest of the Cluster

dbt fluency is essential for analytics engineer interview question prep roles and helpful for data engineer roles. Airflow fluency is essential for any data engineer role that touches orchestration. Both appear in the system design framework for data engineers framework as default tooling choices.

For other tooling decisions, see Snowflake vs Databricks Data Engineer role comparison (warehouse vs lakehouse) and streaming platform decision: Kafka vs Kinesis (message broker decision).

Data Engineer Interview Prep FAQ

Are dbt and Airflow really not competitors?+
Correct. dbt is a SQL transformation framework; Airflow is a general-purpose orchestrator. They serve different layers of the stack. Most modern data teams use both. The 'vs' framing is a common confusion among early-career engineers.
Should I learn dbt or Airflow first?+
Depends on the role. For analytics engineer: dbt first, Airflow optional. For data engineer: both, but the order varies by team. If your team is SQL-heavy, dbt first; if your team is Python-heavy with diverse data sources, Airflow first.
Is dbt Core or dbt Cloud the right choice?+
dbt Core is the open-source CLI; sufficient for most teams. dbt Cloud adds hosted UI, scheduling, IDE, and observability features; useful for teams that don't want to operate Airflow + dbt themselves. Cost trade-off: dbt Cloud is per-developer-seat pricing.
Should I migrate from Airflow to Dagster?+
If you're starting fresh: consider Dagster for the asset-based model. If you have an existing Airflow deployment with hundreds of DAGs: migration cost rarely justifies the move. Both are production-grade.
What's the difference between dbt and Spark SQL?+
Spark SQL is a SQL execution engine within Spark. dbt is a workflow framework that compiles SQL to your warehouse. They operate at different layers. dbt can target Spark SQL (via dbt-spark) but more commonly targets Snowflake, BigQuery, Redshift, Postgres.
Can I use Airflow without dbt?+
Yes, very commonly. Airflow with custom Python tasks, SparkSubmitOperator, and various API operators handles the full orchestration story without dbt. dbt is added when SQL modeling becomes a significant portion of the workload.
Can I use dbt without Airflow?+
Yes. dbt Cloud has a built-in scheduler. Or use cron, GitHub Actions, or any other scheduler. Airflow becomes valuable when you need to coordinate dbt with non-dbt tasks (ingestion, ML, external APIs).
How important is Airflow knowledge for a data engineer interview?+
Important. Airflow appears in 65% of data engineer interview loops in our dataset. You should be able to design a DAG, discuss operators, explain retry semantics, and reason about idempotency in scheduled tasks.

Drill the Tools That Matter for Your Role

Once you know which tools your target role uses, drill the patterns that show up in interviews.

Start Practicing

More Data Engineer Interview Prep Guides

Continue your prep

Data Engineer Interview Prep, explore the full guide

50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.

Interview Rounds

By Company

By Role

By Technology

Decisions

Question Formats