# A Kafka consumer rebalance reprocesses the last batch every deploy, the Stripe webhook retries until

Canonical URL: <https://datadriven.io/problems/a-kafka-consumer-rebalance-reprocesses-the-last-batch-every-42ff35cd>

Domain: Pipeline Design · Difficulty: medium

## Problem

A Kafka consumer rebalance reprocesses the last batch every deploy, the Stripe webhook retries until it gets a 2xx, and the SFTP partner re-uploads files when their cron retries. The current pipeline writes everything that arrives, producing duplicate rows that take weeks to find. The section's pattern: at-least-once delivery is the world; deduplicate at the sink on a stable key drawn from the message itself (Kafka producer UUID, Stripe evt_id, file natural key plus source-file id, source PK plus bookmark window). Add the dedupe key by replacing the sink with one whose name states the dedupe key and the uniqueness mechanism (UNIQUE constraint, MERGE on key, partition overwrite by key window).

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/a-kafka-consumer-rebalance-reprocesses-the-last-batch-every-42ff35cd)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.