DataDriven
LearnPracticeInterviewDiscussDaily
HelpContactPrivacyTermsSecurityiOS App

© 2026 DataDriven

Loading lesson...

  1. Home
  2. Learn
  3. Idempotency

Idempotency

Run it twice, get the same result: the single most important property of a production pipeline

Run it twice, get the same result: the single most important property of a production pipeline

Category
Pipeline Architecture
Difficulty
advanced
Duration
25 minutes
Challenges
0 hands-on challenges

Topics covered: "What Happens If Your Pipeline Runs Twice?", DELETE-INSERT and MERGE Patterns, Idempotent Reads: Offset Management, Idempotency in Streaming vs Batch, Proving Idempotency to the Interviewer

Lesson Sections

  1. "What Happens If Your Pipeline Runs Twice?"

    What They're Really Testing The Unlock Idempotency is not a feature you add. It is a property that emerges from how you write data. The mental model: every write operation should be a function where f(x) = f(f(x)). If you INSERT, running twice creates duplicates. If you DELETE-then-INSERT the same partition, running twice produces the same result. The write pattern determines idempotency. The 60-Second Framework Saying this unprompted in the first 60 seconds of a pipeline design question puts yo

  2. DELETE-INSERT and MERGE Patterns

    There are exactly three idempotent write patterns. Know all three, know when each applies, and know which one to reach for first. The interviewer is testing whether you have a default pattern, not whether you can invent one on the spot. Pattern 1: Partition Overwrite (DELETE-INSERT) Pattern 2: MERGE (Upsert) MERGE is idempotent when the match key is stable and the update is deterministic. Running it twice: first run inserts new rows and updates changed rows. Second run finds all rows already mat

  3. Idempotent Reads: Offset Management

    Most candidates think about idempotent writes but forget about idempotent reads. If your input query changes between runs, your output will change too, even if your write pattern is idempotent. This is the follow-up trap interviewers use to separate hire from strong hire. The Follow-Up Trap Idempotent Read Patterns The strongest signal: 'I would never use NOW() in a pipeline query. Every temporal boundary is a parameter passed by the orchestrator. This makes the pipeline deterministic, testable,

  4. Idempotency in Streaming vs Batch

    Idempotency works differently in batch and streaming, and interviewers test whether you understand why. Batch idempotency is about partition replacement. Streaming idempotency is about exactly-once processing across a distributed system with continuous data flow. Batch Idempotency: Replace the Partition In batch, idempotency is straightforward. You process a bounded chunk (one day, one hour) and overwrite the output partition. The partition is the unit of idempotency. Re-running the same partiti

  5. Proving Idempotency to the Interviewer

    Claiming idempotency is not enough. The interviewer will probe each component of your pipeline to find the one that breaks under re-run. This section gives you the mental walkthrough that proves your design survives. The Idempotency Proof Walkthrough The Follow-Up Trap Vocabulary That Signals Seniority The Bridge Move Red Flag Phrases

Related

  • All Lessons
  • Practice Problems
  • Mock Interview Practice
  • Daily Challenges