Loading lesson...

Streaming Systems

Master offset management, consumer groups, and streaming failure modes

Master offset management, consumer groups, and streaming failure modes

Category
Pipeline Architecture
Difficulty
intermediate
Duration
25 minutes
Challenges
0 hands-on challenges

Topics covered: Consumer Groups and Offsets, Event Sourcing Patterns, Windowing and Watermarks, DLQ Patterns, Spark Streaming vs Flink

Lesson Sections

  1. Consumer Groups and Offsets (concepts: paEventPlatforms)

    What They Want to Hear 'Each consumer in a group tracks its position in the partition using offsets. After processing a batch of events, the consumer commits its offset. If the consumer crashes, it restarts from the last committed offset. This gives at-least-once delivery by default. For exactly-once, I use the transactional pattern: commit the offset and write the output in the same transaction, so either both happen or neither does.' This is the answer that shows you understand the offset comm

  2. Event Sourcing Patterns (concepts: paEventDriven)

    What They Want to Hear 'Event sourcing stores every state change as an immutable event. The current state is derived by replaying events from the beginning. I pair it with CQRS: Command Query Responsibility Segregation, where writes go to the event log and reads come from a materialized view that is built by processing the event stream. This separates the write model from the read model, allowing each to be optimized independently.' This is the answer that shows you understand event sourcing as

  3. Windowing and Watermarks (concepts: paLateData)

    What They Want to Hear 'I choose the window type based on the use case. Tumbling windows are fixed, non-overlapping intervals: count clicks per hour. Sliding windows overlap: count clicks in the last hour, updated every 5 minutes. Session windows are activity-based: group events that are close together in time, with a gap timeout. I set the watermark based on observed lateness: if 99th percentile lateness is 5 minutes, I set the watermark to 10 minutes to catch stragglers with margin.' This is t

  4. DLQ Patterns (concepts: paDeadLetterQueue)

    What They Want to Hear 'I structure DLQ events with metadata: the original event, the error message, the retry count, and the timestamp. For reprocessing, I classify errors first. Transient errors (timeout, rate limit) go back to the main topic after a delay. Permanent errors (schema mismatch, invalid data) require a code fix before replay. I never blindly replay the entire DLQ: that just reproduces the same errors and wastes compute.' This is the answer that shows you have actually dealt with a

  5. Spark Streaming vs Flink (concepts: paMicroBatchVsTrue)

    What They Want to Hear 'Spark Structured Streaming is micro-batch: it collects events for a trigger interval (e.g., 10 seconds), processes them as a batch, then starts the next interval. Flink processes each event as it arrives with continuous operator pipelines. The practical differences: Spark has a 100ms latency floor and simpler state management. Flink has sub-10ms latency, more powerful windowing (session windows, event-time processing), and built-in savepoints for state migration. I choose