Loading section...

Partial Failure in a Batch

Concepts covered: paPartialFailure, paQuarantine

A batch job processes ten thousand rows. One row fails. The question is what happens to the other 9,999. The two extreme answers are both common and both wrong. Failing the entire batch loses progress on every good row. Silently dropping the bad row hides a problem that might be a symptom of a larger issue. The right answer is somewhere in the middle, and choosing the right point on the spectrum is one of the most consequential decisions a pipeline designer makes about a given workload. Three Strategies When All-or-Nothing Is the Right Answer A financial reconciliation batch that produces a daily ledger should be all-or-nothing. The ledger needs to balance. A subset of the rows would produce a ledger that does not balance, which is worse than no ledger at all. The all-or-nothing strategy d

About This Interactive Section

This section is part of the Failure Modes and Error Handling: Intermediate lesson on DataDriven, a free data engineering interview prep platform. Each section includes explanations, worked examples, and hands-on code challenges that execute in real time. SQL queries run against a live PostgreSQL database. Python runs in a sandboxed Docker container. Data modeling problems validate against interactive schema canvases. All content is framed around what data engineering interviewers actually test at companies like Meta, Google, Amazon, Netflix, Stripe, and Databricks.

How DataDriven Lessons Work

DataDriven combines four interview rounds (SQL, Python, Data Modeling, Pipeline Architecture) with adaptive difficulty and spaced repetition. Easy problems get harder as you improve. Weak concepts resurface until you master them. Your readiness score tracks progress across every topic interviewers test. Every lesson section ends with problems you solve by writing and running real code, not by picking multiple-choice answers.