dbt and Airflow are two of the most-used tools in modern data engineering, and the question "dbt vs Airflow" often confuses early-career engineers because the tools aren't actually competitors. They serve different parts of the stack: dbt is a SQL-first transformation framework that runs in your warehouse; Airflow is a general-purpose orchestrator that runs anywhere. Most modern data teams use both: Airflow orchestrates dbt runs alongside other tasks. This guide breaks down where each fits, how they integrate, and what interviewers test about each. Pair with the our data engineer interview prep hub.
The two tools serve different layers of the stack and complement each other in production.
| Dimension | dbt | Airflow |
|---|---|---|
| Primary purpose | SQL transformation in warehouse | General-purpose orchestration |
| Language | SQL + Jinja templating | Python (DAG definition + tasks) |
| Where logic runs | In the warehouse (Snowflake, BigQuery, Redshift) | Where you point it (Spark, Bash, Python, dbt CLI) |
| Best for | Transformation, modeling, testing | Multi-step workflows with external dependencies |
| Lineage | Native (auto-generated from refs) | Task-level only |
| Testing | Built-in tests on data | Limited; tests are workflow-level |
| Documentation | Native (dbt docs) | Limited |
| Scheduling | External (uses Airflow or dbt Cloud) | Native scheduler |
| Operators / connectors | Warehouse-specific | Hundreds of operators for everything |
| Learning curve | Moderate (SQL-first, easy entry) | Steep (Python framework + ops) |
| Cost | Free (dbt Core) or paid (dbt Cloud) | Free (Apache project) or managed (MWAA, Astronomer) |
| Most-likely user | Analytics engineer | Data engineer + analytics engineer |
dbt is purpose-built for SQL-based transformation in a warehouse. Its strengths are: declarative model definition (CREATE TABLE AS SELECT pattern), automatic lineage from ref() and source() functions, built-in tests on data (unique, not_null, accepted_values, relationships, custom tests), native documentation with column-level descriptions, and a workflow that feels natural to anyone who already writes SQL.
dbt's sweet spot is the modeling layer of a modern data stack: source data lands in the warehouse (via Fivetran, Airbyte, or custom ingestion), dbt transforms it into clean staging, intermediate, and mart layers. The output is queryable by BI tools or downstream services.
What dbt doesn't do: ingestion (it operates on data already in the warehouse), Python transformations (dbt Python models exist but are limited), arbitrary workflow orchestration (it can chain models but not external systems), real-time streaming (dbt is batch-first).
Airflow is a general-purpose workflow orchestrator. Its strengths are: rich operator ecosystem (hundreds of operators for AWS, GCP, databases, APIs, custom code), Python-based DAG definition (programmable, version-controlled, testable), retry and failure handling at the task level, scheduling with cron- like expressions, sensor patterns for waiting on external events.
Airflow's sweet spot is multi-step pipelines with external dependencies: pull data from an API, land it in S3, trigger a Spark job, run dbt models on the resulting tables, push the output to a downstream consumer. Each step might use a different tool or run in a different environment.
What Airflow doesn't do: SQL transformation logic itself (Airflow runs the dbt CLI; the SQL lives in dbt), real-time streaming (Airflow is batch- oriented; for streaming use Flink, Spark Structured Streaming, or Kafka Streams), ad-hoc query execution.
Most production data teams in 2026 use both: Airflow as the orchestrator, dbt as the transformation framework inside Airflow. The pattern: Airflow DAG defines the end-to-end pipeline. Tasks include ingestion (custom or via Airflow operators), then a dbt run task that executes the relevant dbt models, then downstream tasks (export, notification, ML training).
The Airflow-dbt integration is mature in 2026. Astronomer's open-source Cosmos library renders each dbt model as a separate Airflow task, giving you task-level visibility, parallelization, and failure isolation. The alternative (a single BashOperator running "dbt run") is simpler but loses model-level granularity.
In 2026, modern alternatives like Dagster offer dbt integration as a first-class feature. Dagster + dbt is increasingly common at companies starting fresh because Dagster's asset-based model fits dbt more naturally than Airflow's task-based model. Both work; Airflow has more ecosystem maturity.
The honest decision rule: pick based on what you need to build, not based on which tool is more famous.
| Need | Tool | Reason |
|---|---|---|
| Define warehouse models with SQL | dbt | Native fit |
| Test data quality on warehouse tables | dbt | Built-in tests |
| Auto-generate lineage from SQL | dbt | ref() and source() build the graph |
| Schedule a daily Spark job | Airflow | SparkSubmitOperator |
| Pull data from a REST API | Airflow | HTTP operators |
| Coordinate dbt run with upstream ingestion | Airflow + dbt | Airflow orchestrates; dbt models |
| Trigger ML training after data arrives | Airflow | Multi-step workflow |
| Stream Kafka events to a database | Neither (use Flink or Spark Streaming) | Both are batch-oriented |
| Run a backfill across 90 days | Airflow + dbt | Airflow handles scheduling; dbt handles models |
| Document column-level meaning | dbt | Native dbt docs |
| Cron-like scheduling | Airflow | Native scheduler |
| Send Slack notification on pipeline failure | Airflow | SlackOperator |
| Convert raw events to gold tables | dbt | Modeling layer |
| Multi-cloud orchestration | Airflow | Cloud-agnostic |
dbt fluency is essential for analytics engineer interview question prep roles and helpful for data engineer roles. Airflow fluency is essential for any data engineer role that touches orchestration. Both appear in the system design framework for data engineers framework as default tooling choices.
For other tooling decisions, see Snowflake vs Databricks Data Engineer role comparison (warehouse vs lakehouse) and streaming platform decision: Kafka vs Kinesis (message broker decision).
Once you know which tools your target role uses, drill the patterns that show up in interviews.
Start PracticingData Engineer vs AE roles, daily work, comp, skills, and which to target.
Data Engineer vs MLE roles, where the boundary lives, comp differences, and how to switch.
Data Engineer vs backend roles, daily work, comp, interview differences, and crossover paths.
When SQL wins, when Python wins, and how Data Engineer roles use both.
Snowflake vs Databricks, interview differences, role differences, and how to choose.
Kafka vs Kinesis, throughput, cost, ops burden, and the Data Engineer interview implications.
Continue your prep
50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.