Loading lesson...
When producers outrun consumers, you need backpressure or you need a bigger consumer
When producers outrun consumers, you need backpressure or you need a bigger consumer
Topics covered: The "What If Volume Doubles?" Question, Backpressure Mechanisms, Horizontal Scaling: Partitions and Parallelism, Autoscaling and Cost Tradeoffs, End-to-End Throughput Analysis
What They're Really Testing The Unlock Every pipeline has a throughput chain: source rate, ingestion layer, processing layer, sink write speed. The slowest link determines system throughput. Scaling the wrong link wastes money and changes nothing. The first job is to find the binding constraint, not to throw hardware at the problem. The 60-Second Framework Step 1 is the strong-hire signal. Converting 'a lot of data' into events/second shows you think quantitatively. Most candidates never do this
Backpressure is the flow control mechanism that prevents a fast producer from overwhelming a slow consumer. Without it, queues grow unbounded, memory fills, and the system crashes. The interview tests whether you know multiple backpressure strategies and when each applies. Four Backpressure Strategies Kafka Consumer Lag: The Health Metric Consumer lag is the difference between the latest offset produced and the latest offset committed by the consumer. Growing lag means the consumer is falling be
Horizontal scaling in Kafka means adding partitions and consumers together. Each partition is consumed by exactly one consumer in a consumer group. Adding consumers beyond the partition count wastes resources. Adding partitions without consumers wastes Kafka storage. They must scale in lockstep. The Partition-Consumer Relationship The strong-hire move: 'I would set the initial partition count to 2-3x the current consumer count so we can scale consumers without repartitioning. Increasing partitio
Autoscaling sounds like the obvious answer but it introduces complexity that interviewers probe. When do you scale up? How fast? What about scale-down? What does over-provisioning cost? These questions test whether you think about the business impact of technical decisions. Autoscaling Triggers The Cost Conversation The L6 signal: 'I would tag all autoscaled resources with the pipeline name and team owner so we can attribute cost to the specific pipeline. If a pipeline's cost grows 5x month-over
The strongest interview answer is a throughput calculation. Not hand-waving about 'big data,' but specific numbers: events per second, bytes per record, processing time per record, sink write latency, and which component is the binding constraint. The Throughput Chain Example Calculation 1 billion events/day. Average record size: 500 bytes. 11,574 Vocabulary That Signals Seniority The Bridge Move