Question 1

What is the difference between ETL and ELT?

Accepted Answer

ETL transforms data in flight (in a Talend or Informatica or custom-code pipeline) before landing the clean data in the warehouse. ELT lands raw data in the warehouse first (a bronze layer) and transforms inside the warehouse with dbt or Spark on the bronze tables. ELT dominates in 2026 because columnar warehouse compute is cheap and the raw layer enables replay and backfill without re-running ingest.

Question 2

What is the difference between OLAP and OLTP?

Accepted Answer

OLAP (online analytical processing) is wide read queries over historical data, columnar storage, denormalized star schemas or OBT, optimized for aggregation. OLTP (online transactional processing) is narrow point-lookup and update queries over current state, row storage, normalized 3NF schemas, optimized for transaction throughput. A data engineer's warehouse is OLAP. The source systems feeding the warehouse are OLTP.

Question 3

What is the difference between Kimball, Inmon, and Data Vault?

Accepted Answer

Kimball: denormalized star schemas optimized for query simplicity, designed bottom-up from business processes. Inmon: normalized 3NF integrated enterprise data warehouse designed top-down from enterprise architecture, with downstream Kimball marts for query. Data Vault: hub-link-satellite designed for audit and parallel ingestion, with a downstream business vault or Kimball marts for query. Most 2026 startups and tech companies use Kimball. Large financial services, healthcare, and traditional enterprise IT often use Inmon or Vault.

Question 4

What are conformed dimensions and why do senior data engineer rubrics weight them?

Accepted Answer

Conformed dimensions are dimension tables with one schema and one set of surrogate keys, used by multiple fact tables (or data marts). One dim_customer used by the sales mart, support mart, and marketing mart. Without conformed dims, joining 'customers who placed orders and opened support tickets' requires custom identifier mapping. The upfront design cost pays back on every cross-mart query.

Question 5

How does a data engineer handle slowly-changing fact corrections?

Accepted Answer

Two patterns. Append-only with version column: every correction is a new fact row with version+1; queries filter on max(version) per natural key. In-place update with audit log: UPDATE the fact row, INSERT into a separate audit table for compliance. Trade-off: query simplicity vs audit ease vs storage cost. Financial warehouses usually need append-only with audit; product analytics warehouses often accept in-place with lightweight audit.

Question 6

What is the medallion architecture in a modern warehouse?

Accepted Answer

Three layers. Bronze: raw, append-only, schema-on-read, ideally still in source format. Silver: cleaned, typed, deduplicated, conformed dimensions ready. Gold: business-ready and modeled, usually star schemas. Each layer has its own ownership and quality contract. A bug in silver does not require re-ingesting from the source. Default on lakehouses like Databricks Delta and Iceberg.

Question 7

How should a data engineer design a multi-region warehouse?

Accepted Answer

Active-active across regions with async CDC replication for cross-region facts. Conflict resolution via last-writer-wins for ordered data or CRDT for counters. SLA tiers: real-time within region, eventually-consistent across regions. 2x storage minimum cost. Multi-region is rarely the right answer in a 45-minute interview but is the expected stretch-question follow-up at L6+. Most companies do not need it.

Question 8

What is the difference between a data warehouse and a data lake?

Accepted Answer

Data warehouse: structured tables (rows and columns), columnar storage, optimized for analytical queries, examples are Snowflake, BigQuery, Redshift. Data lake: object storage (S3, GCS, ADLS) holding files in formats like Parquet, ORC, Avro, JSON, with a separate query engine (Athena, Presto, Spark) on top. Lakehouse merges the two: object storage with table format (Iceberg, Delta, Hudi) providing ACID transactions and schema evolution on top of files. Databricks and Snowflake are converging toward the lakehouse.

Data Warehouse Interview Questions

Data Warehouse Interview Questions

Data Modeling (63)