Loading lesson...

Architecture at Scale

Defend your architecture decisions with cost modeling and production patterns

Challenges: 0 hands-on challenges

Lesson Sections

Lambda in Practice (concepts: paLambdaArch)
What They Want to Hear 'In practice, Lambda means running Spark batch alongside Spark Structured Streaming, both writing to the same serving layer. The speed layer writes to a real-time view (e.g., a Kafka-backed materialized view or a hot table). The batch layer overwrites the same data with corrected results daily. I use view unioning: the serving query reads from the batch table first, then overlays any newer data from the speed table. Consistency is eventual: the batch layer corrects any dri
When Kappa Works (concepts: paKappaArch)
What They Want to Hear 'Kappa works when three conditions are met: the event log retains enough history for full reprocessing, the streaming logic can handle both real-time and replay workloads, and the team has the operational maturity to run a streaming platform 24/7. For reprocessing, I deploy a second instance of the streaming job, point it at the beginning of the log, and write to a new output table. When the replay catches up to real-time, I swap the consumer to the new table. The old tabl
Pushdown Across Formats (concepts: paPredicatePushdown)
What They Want to Hear 'Pushdown behavior depends on the storage format. Parquet supports column pruning and row-group statistics filtering natively. Delta Lake adds data skipping with file-level min/max statistics and Z-ordering for multi-column predicates. Iceberg goes further with partition-level statistics and hidden partitioning that decouples the physical partition scheme from the query predicate. Dynamic partition pruning optimizes joins: when one side of a join filters partitions, the en
Right-Sizing Clusters (concepts: paCostOptimization)
What They Want to Hear 'I right-size by measuring utilization first. If CPU utilization averages 30% on a $80K/month cluster, I am paying for 70% idle capacity. Three strategies: (1) Auto-scaling: scale executors based on workload, not peak capacity. (2) Spot instances for batch: 60-70% discount with checkpointing for fault tolerance. (3) Reserved instances for baseline: commit to the minimum always-needed capacity at 30-50% discount, scale above that with on-demand or spot.' This is the answer
MERGE and UPSERT Patterns (concepts: paIdempotency)
What They Want to Hear 'MERGE matches source rows against target rows on a key. When matched, it updates. When not matched, it inserts. For idempotency, the key must be the business key (e.g., order_id), not a surrogate. To optimize a slow MERGE: (1) partition the target table and scope the MERGE to affected partitions only, (2) stage the delta into a temp table with dedup applied before merging, (3) if the delta exceeds 30% of the partition, switch to partition REPLACE instead.' This is the ans