Refactor: From ETL to Idempotent

Concepts covered: paIdempotency

The patterns are clearer when applied to a real refactor. The pipeline below is a real-shaped daily ETL that ingests payments from a Stripe-like API, joins them to customer accounts, and writes a daily payments fact table. The original version was written in a hurry and has every common idempotency bug at once. The refactored version applies the three patterns above: partition keys, MERGE on a business key, and explicit time bounds. The diff is the worked example. Refactors of this shape are common because most production pipelines were written under deadline pressure by engineers who had not yet been burned by the failure modes in this lesson; the bugs are not malicious or careless, they are the natural state of a pipeline that has not yet had the property added explicitly. Before: The Or

About This Interactive Section

This section is part of the Idempotency and Backfill: Intermediate lesson on DataDriven, a free data engineering interview prep platform. Each section includes explanations, worked examples, and hands-on code challenges that execute in real time. SQL queries run against a live PostgreSQL database. Python runs in a sandboxed Docker container. Data modeling problems validate against interactive schema canvases. All content is framed around what data engineering interviewers actually test at companies like Meta, Google, Amazon, Netflix, Stripe, and Databricks.

How DataDriven Lessons Work

DataDriven combines four interview rounds (SQL, Python, Data Modeling, Pipeline Architecture) with adaptive difficulty and spaced repetition. Easy problems get harder as you improve. Weak concepts resurface until you master them. Your readiness score tracks progress across every topic interviewers test. Every lesson section ends with problems you solve by writing and running real code, not by picking multiple-choice answers.