Loading lesson...

Architecture at Scale

Staff-level: TCO modeling, distributed idempotency, and architectural philosophy

Challenges: 0 hands-on challenges

Lesson Sections

Lambda vs Kappa Philosophy (concepts: paLambdaArch)
What They Want to Hear 'The Lambda vs Kappa decision is not a technical debate. It is an organizational maturity question. Lambda is safer: the batch layer is the safety net that corrects streaming errors. Kappa is simpler: one codebase, one truth, one debugging surface. But Kappa demands higher operational maturity: 24/7 streaming monitoring, incident response for pipeline stalls, and replay infrastructure for reprocessing. My recommendation: start with batch, add a speed layer (Lambda) for use
Streaming-Only Reprocessing (concepts: paKappaArch)
What They Want to Hear 'Reprocessing in Kappa means replaying the event log through a corrected version of the streaming job. For long-duration replays (months of data), I use tiered replay: recent data replays from Kafka at full speed, older data replays from S3 archives at reduced throughput. I write replay output to a shadow table, validate against the original output for known-correct windows, and swap on validation pass. The critical constraint: the replay job must consume the same event sc
Federated Query Pushdown (concepts: paPredicatePushdown)
What They Want to Hear 'Federated query engines like Trino and Presto push predicates to each data source independently. When I join an S3 Parquet table with a PostgreSQL table, the engine pushes the WHERE clause to both connectors: Parquet partition pruning on S3, and a SQL WHERE clause to PostgreSQL. But pushdown capability varies by connector: the Hive connector supports partition pruning and column pushdown, while a generic JDBC connector may only push simple equality predicates. Knowing wha
TCO Modeling (concepts: paCostOptimization)
What They Want to Hear 'TCO includes more than compute and storage. I model five cost categories: infrastructure (compute, storage, network), operational (on-call hours, incident response, monitoring tools), development (engineering hours to build and maintain), opportunity (what the team cannot build while maintaining this system), and risk (cost of data quality incidents, SLA violations). Most architecture decisions look different when you include operational and opportunity costs. A $50K/year
Distributed Idempotency (concepts: paIdempotency)
What They Want to Hear 'Distributed idempotency requires idempotency tokens: a unique identifier for each operation that every downstream system uses to deduplicate. When the pipeline produces output, it tags each write with a deterministic token derived from the input data (e.g., hash of partition date + pipeline version + input checksum). Each downstream system checks: have I already processed this token? If yes, skip. If no, process and record the token. This works across S3, Snowflake, and K