Pipeline Architecture

ETL vs ELT: When to Use Each

Two approaches to getting data from source systems into a warehouse. ETL transforms before loading. ELT loads first and transforms inside the warehouse. System design is a small slice of DE interview rounds, but pipeline architecture questions within the modeling segment frequently test this trade-off.

Know the trade-offs, not just the acronyms. SQL is the most-tested skill in DE interviews, and your ELT answer should reflect whether transformations happen in SQL inside the warehouse or outside it.

ETL

Extract → Transform → Load

Data is pulled from sources, reshaped and cleaned on external servers, then loaded into the warehouse in its final form. The warehouse only stores processed data.

ELT

Extract → Load → Transform

Data is pulled from sources and loaded raw into the warehouse. Transformations happen inside the warehouse using SQL. Raw data is preserved for flexibility.

Detailed Comparison

Transformation Timing

ETL

Transform before loading. Data is cleaned, joined, and reshaped in a staging area, then loaded into the warehouse in its final form.

ELT

Load first, transform later. Raw data lands in the warehouse, then SQL or dbt models reshape it into analytics-ready tables.

Storage Requirements

ETL

Lower warehouse storage. Only transformed data lives in the destination. Raw data stays in the source or is discarded.

ELT

Higher warehouse storage. You keep raw data alongside transformed tables. Storage is cheap in modern cloud warehouses, so this is usually acceptable.

Flexibility

ETL

Less flexible. If you need to change a transformation, you re-run the pipeline from scratch. Adding a new column means updating the ETL job.

ELT

More flexible. Raw data is preserved, so you can re-transform it any time without re-extracting from the source. New questions can be answered from existing raw data.

Tooling

ETL

Traditional tools: Informatica, Talend, SSIS, custom Python scripts. The transformation engine is outside the warehouse.

ELT

Modern stack: dbt, SQL-based transforms inside the warehouse. The warehouse itself is the transformation engine.

Cost Model

ETL

Compute cost is in the ETL layer (your servers or a managed service). Warehouse costs are lower because less data is stored.

ELT

Compute cost is in the warehouse (you pay for query processing). Warehouse costs are higher, but you avoid maintaining a separate transformation layer.

When to Use

ETL

When the warehouse has limited compute power. When data privacy requires filtering sensitive fields before they reach the warehouse. When you have well-defined, stable transformations.

ELT

When using a cloud warehouse with scalable compute. When transformation logic changes frequently. When analysts need access to raw data for ad-hoc analysis. This is the default for most modern data teams.

Why ETL Came First (and Why ELT Took Over)

The shift from ETL to ELT mirrors the shift from on-premise to cloud data warehouses. Understanding the history helps you explain trade-offs in interviews.

1990s-2000s

ETL was the only option

Warehouses had limited compute and expensive storage. It made sense to transform data on cheaper application servers before loading only the clean, final result. Tools like Informatica and SSIS were built for this model.

2010s

Cloud warehouses changed the math

Cloud warehouses offered massive parallel compute and cheap storage. Suddenly, the warehouse itself could handle transformations faster than external servers. The bottleneck shifted from warehouse capacity to pipeline complexity.

2018+

ELT became the default

dbt made SQL-based transformation a first-class workflow. Teams started loading raw data into the warehouse and writing version-controlled SQL models to transform it. This gave analysts more flexibility and made pipelines easier to debug.

ETL vs ELT in Data Engineering Interviews

Pipeline trade-offs surface frequently in the data modeling segment of DE interviews. Interviewers want to hear trade-off reasoning, not textbook definitions.

When would you choose ETL over ELT?

When data must be filtered for privacy before reaching the warehouse (PII stripping, GDPR compliance). When the warehouse is on-premise with limited compute. When transformations are stable and well-defined.

When would you choose ELT over ETL?

When using a cloud warehouse with elastic compute. When the analytics team needs flexibility to re-transform data. When you want version-controlled SQL transformations (dbt). This is the more common answer in modern data engineering interviews.

Can you use both in the same pipeline?

Yes. A common pattern: light ETL for data cleansing and PII removal, then ELT for business logic transformations inside the warehouse. Mention this hybrid approach to show practical experience.

Frequently Asked Questions

What does ETL stand for?+
Extract, Transform, Load. Data is extracted from source systems, transformed (cleaned, joined, aggregated) in a staging area, then loaded into the destination warehouse in its final form.
What does ELT stand for?+
Extract, Load, Transform. Data is extracted from source systems, loaded into the warehouse as raw data, then transformed inside the warehouse using SQL or tools like dbt. The order of the last two steps is reversed compared to ETL.
Is ELT always better than ETL?+
No. ELT is the better default for most modern cloud-based data warehouses because compute is cheap and flexible. But ETL is still the right choice when you need to filter sensitive data before it reaches the warehouse, or when your destination has limited processing power.
Do data engineering interviews ask about ETL vs ELT?+
Yes. Pipeline architecture trade-offs surface within the data modeling segment of DE interviews (roughly a third of loops). Interviewers want to hear you explain the trade-offs, not just the definitions. Be ready to say when you would pick each pattern and why.

Practice the SQL Behind ETL and ELT

DataDriven covers SQL, Python, and data modeling with hands-on challenges. Build the skills interviewers actually test.