Loading section...

Reads Can Be Non-Idempotent Too

Concepts covered: paExplicitTimeBounds, paEventVsProcessingTime

Most discussions of idempotency focus on writes. The hidden second half is reads. A pipeline that reads non-deterministic input cannot be idempotent in any useful sense, because the same logical run produces different output on different days. The most common offenders are SELECT NOW(), CURRENT_DATE, and any function that resolves to wall-clock time inside a transform. Other offenders include random number generators without seeds, environment variables that drift between runs, and any external service whose response varies over time without a corresponding input change. The fix is not subtle and it is not optional. Pipelines should accept explicit time bounds as parameters, not derive them from the current moment, and any other source of run-time variability needs the same treatment: pin

About This Interactive Section

This section is part of the Idempotency and Backfill: Intermediate lesson on DataDriven, a free data engineering interview prep platform. Each section includes explanations, worked examples, and hands-on code challenges that execute in real time. SQL queries run against a live PostgreSQL database. Python runs in a sandboxed Docker container. Data modeling problems validate against interactive schema canvases. All content is framed around what data engineering interviewers actually test at companies like Meta, Google, Amazon, Netflix, Stripe, and Databricks.

How DataDriven Lessons Work

DataDriven combines four interview rounds (SQL, Python, Data Modeling, Pipeline Architecture) with adaptive difficulty and spaced repetition. Easy problems get harder as you improve. Weak concepts resurface until you master them. Your readiness score tracks progress across every topic interviewers test. Every lesson section ends with problems you solve by writing and running real code, not by picking multiple-choice answers.