Loading section...

The Transformation Layer

Concepts: paEltVsEtl, paMedallion, paPartitioning, paSparkExecutionModel, paColumnarVsRow

Justifying ELT over ETL and describing medallion tiers is expected. To score 'strong hire,' the interviewer expects you to articulate when ELT breaks down, how to handle transforms that span multiple data sources, and the cost implications of your compute choices at scale. When ELT Breaks Down The conventional wisdom is ELT for everything. In practice, there are cases where transform-before-load is correct. PII scrubbing: you don't want raw PII landing in the warehouse, even in bronze, if your warehouse doesn't have column-level encryption. Data reduction: if you're ingesting 10TB/day of raw logs but only need 100GB of aggregated metrics, transforming before load saves significant storage and compute. Compliance: GDPR right-to-deletion is easier when PII never enters the analytical system.