Loading lesson...
Operator-level depth: testing, incidents, cost, and observability
What They Want to Hear 'Three layers: unit tests (seconds, test individual transforms on mock data), integration tests (minutes, test end-to-end with a real database), and data diff tests (hours, compare production output before and after a change). Each layer catches different bugs. Unit tests catch logic errors. Integration tests catch environment issues. Data diffs catch subtle regressions that unit tests cannot anticipate.'
What They Want to Hear 'Six steps: detect, triage, contain, root cause, remediate, post-mortem. Detect: automated alerts fire. Triage: is this impacting consumers? What is the blast radius? Contain: quarantine bad data, serve last known good. Root cause: trace back from the symptom. Remediate: fix and reprocess. Post-mortem: what failed, why, and what changes to prevent recurrence.'
What They Want to Hear '60-70% of platform compute typically goes to transformation. The three biggest levers: switch full-refresh models to incremental (5-10x cheaper), use views instead of materialized tables where latency allows (free), and sort/cluster data on write to reduce downstream scan costs. Incremental models are the single biggest cost optimization most teams can make.'
What They Want to Hear 'At 1 billion rows, ROW_NUMBER works fine with proper partitioning. At 10 billion, MERGE/UPSERT is more efficient: stage the delta, merge against the target. At 100 billion+ or for fuzzy matching, MinHash LSH (locality-sensitive hashing) reduces the comparison space from O(n^2) to near-linear by grouping similar records into buckets.'
What They Want to Hear 'Buy for small teams (Monte Carlo, Bigeye: observability as a service). Build for large platform teams (Great Expectations: open-source, customizable). The decision is cost of tool vs cost of engineering time. A $50K/year observability tool replaces 2-3 months of engineering effort to build something equivalent.'