DataDriven
LearnPracticeInterviewDiscussDaily
HelpContactPrivacyTermsSecurityiOS App

© 2026 DataDriven

Loading lesson...

  1. Home
  2. Learn
  3. CDC and Dual Writes

CDC and Dual Writes

Capturing changes from source databases without breaking them is the core data engineering skill

Capturing changes from source databases without breaking them is the core data engineering skill

Category
Pipeline Architecture
Difficulty
advanced
Duration
25 minutes
Challenges
0 hands-on challenges

Topics covered: "How Do You Get Data Out of the Source Database?", Log-Based CDC: Debezium and the WAL, Query-Based CDC and Its Limitations, The Dual-Write Problem, CDC Pipeline End-to-End

Lesson Sections

  1. "How Do You Get Data Out of the Source Database?"

    What They're Really Testing The Three Extraction Methods The Unlock Why Companies Care At Uber, query-based CDC on the ride table missed soft deletes (canceled rides) because the application set a status flag instead of updating the updated_at column. At Netflix, full-load extraction of the content metadata table took 4 hours and blocked the source database's connection pool during peak streaming hours. At Stripe, log-based CDC (Debezium) captures every payment state change with sub-second laten

  2. Log-Based CDC: Debezium and the WAL

    Debezium is the de facto standard for log-based CDC. It connects to the database's replication slot (PostgreSQL) or binlog (MySQL), reads committed changes, and publishes them as structured events to Kafka topics. One topic per table. Each event contains the before and after state of the row, plus metadata. Debezium Change Event Structure Why Log-Based CDC Is Non-Invasive

  3. Query-Based CDC and Its Limitations

    Query-based CDC is the naive approach that most candidates propose first. It works by querying for rows where updated_at > last_run_time. It is simple to implement but has fundamental limitations that the interviewer will probe. The Three Failures of Query-Based CDC When Query-Based CDC Is Acceptable

  4. The Dual-Write Problem

    A dual write is when the application writes to two systems: the database and a message queue (or search index, or cache). It sounds reasonable but it creates a consistency problem that has no simple fix. This is the trap the interviewer is waiting for. The Dual-Write Failure Modes The Fix: Outbox Pattern When Dual Writes Are Acceptable

  5. CDC Pipeline End-to-End

    The interview closer: design a complete CDC pipeline from source database to warehouse, addressing every component. This is the 'narrate the data journey' answer that interviewers describe as the strongest possible signal. The Architecture The Follow-Up Traps Vocabulary That Signals Seniority

Related

  • All Lessons
  • Practice Problems
  • Mock Interview Practice
  • Daily Challenges