DataDriven
LearnPracticeInterviewDiscussDaily
HelpContactPrivacyTermsSecurityiOS App

© 2026 DataDriven

Loading lesson...

  1. Home
  2. Learn
  3. How Data Moves

How Data Moves

Handle the depth probes: idempotency, backpressure, and cost

Handle the depth probes: idempotency, backpressure, and cost

Category
Pipeline Architecture
Difficulty
beginner
Duration
30 minutes
Challenges
0 hands-on challenges

Topics covered: Idempotent Pipelines, Backpressure, Late-Arriving Data, Dead Letter Queues, Cost of Freshness

Lesson Sections

  1. Idempotent Pipelines (concepts: paBatchProcessing)

    What They Want to Hear 'An idempotent pipeline produces the same result whether it runs once or five times on the same input. I achieve this with MERGE statements that upsert on a primary key, or by replacing entire partitions on each run. This means every retry, every backfill, and every re-run is safe.' This is the answer that shows production experience. Candidates who say 'just make it transactional' are missing the point. Three Idempotent Patterns

  2. Backpressure (concepts: paStreamProcessing)

    What They Want to Hear 'Backpressure happens when downstream cannot process data as fast as upstream produces it. Without handling it, you either drop data, run out of memory, or queue indefinitely. My approach: buffer short spikes, throttle the producer for sustained overload, and auto-scale the consumer if the infrastructure supports it.' Then name the four strategies.

  3. Late-Arriving Data (concepts: paStreamProcessing)

    What They Want to Hear 'Late data is normal, not exceptional. A mobile device loses connectivity, reconnects, and sends a burst of events from 30 minutes ago. I handle this with watermarks: the system's estimate of how far behind reality the data might be. Events arriving after the watermark go to a side output. A daily batch reconciliation job picks up anything the streaming layer dropped.'

  4. Dead Letter Queues (concepts: paApiIngestion)

    What They Want to Hear 'A DLQ is where events go when they cannot be processed: malformed JSON, schema violations, unhandled exceptions. Instead of crashing the pipeline or silently dropping the event, I route it to a separate queue with the original payload plus the error metadata. Someone reviews the DLQ and either fixes the root cause or discards the events.'

  5. Cost of Freshness (concepts: paBatchVsStreaming)

    What They Want to Hear 'Real-time costs 3-5x more than batch for the same throughput. The cost is not just infrastructure: it is operational complexity, debugging difficulty, and on-call burden. A streaming pipeline that fails at 3 AM requires someone to fix it at 3 AM. A batch pipeline that fails at 3 AM can wait until morning.' This answer shows you think about total cost of ownership, not just compute bills.

Related

  • All Lessons
  • Practice Problems
  • Mock Interview Practice
  • Daily Challenges