Loading lesson...
Batch vs Streaming: Advanced
Lambda, Kappa, and unified engines: architectures live or die on freshness tier discipline
Lambda, Kappa, and unified engines: architectures live or die on freshness tier discipline
- Category
- Pipeline Architecture
- Difficulty
- advanced
- Duration
- 35 minutes
- Challenges
- 0 hands-on challenges
Topics covered: Lambda Architecture, Kappa: Stream Only, Batch Replay, Unified Engines: Where Lines Blur, Per-Node Freshness Tier Analysis, Lambda to Kappa Worked Example
Lesson Sections
- Lambda Architecture (concepts: paLambdaArch)
Lambda architecture is the first widely adopted attempt to combine batch and streaming in one system. Nathan Marz proposed it around 2011 in his book Big Data, drawing on his experience at Twitter and BackType. The motivation was specific to the era: batch frameworks (Hadoop MapReduce) were correct but slow; stream frameworks (Storm) were fast but produced approximate results. Lambda combined the two, using batch for the durable correct view and streaming for the live approximate view. Both laye
- Kappa: Stream Only, Batch Replay (concepts: paKappaArch)
Kappa architecture, proposed by Jay Kreps in 2014, is the answer to Lambda's two-codebase problem. The idea is simple: keep only the streaming layer. The event log is the source of truth, the streaming pipeline produces the canonical view, and batch becomes a special case (replaying the event log through the same streaming pipeline) rather than a separate codebase. One implementation of the logic, one operational profile, one set of failure modes. The simplification is real, and Kappa has become
- Unified Engines: Where Lines Blur (concepts: paUnifiedEngines)
The cleanest version of Kappa requires an engine that runs the same code in batch and streaming modes. Modern engines have moved toward this ideal. Spark Structured Streaming exposes a unified DataFrame API where the same query can run as a batch job, a micro-batch streaming job, or a continuous streaming job by changing one configuration. Apache Flink runs streaming as the default and batch as a special case (a bounded stream). Apache Beam abstracts both into a single programming model. The con
- Per-Node Freshness Tier Analysis (concepts: paFreshnessTierAnalysis)
A single pipeline rarely needs one freshness tier across every node. The source might produce events continuously. The raw landing layer might lag the source by seconds. The curated layer might rebuild hourly. The serving layer might refresh on a per-consumer schedule. Treating the entire pipeline as one tier (the strictest one) overbuilds most nodes; treating it as the loosest tier underbuilds the consumer-facing edge. Senior engineers tier each node explicitly and label it on the architecture
- Lambda to Kappa Worked Example (concepts: paLambdaToKappaMigration)
The synthesis exercise walks through a real-shaped migration: a workload originally designed as Lambda, redesigned as Kappa, with explicit notes on what changes in code, in storage, and in operations. The example is a streaming media company's content engagement pipeline. The exercise shows that the migration is not a rewrite; it is a careful retirement of the batch layer and a tightening of the streaming layer, with the immutable event log surviving as the architectural anchor. The Lambda Start