Loading lesson...
Handle the follow-ups: dbt, SLAs, dedup, anti-patterns
What They Want to Hear 'Staging models are Silver: they clean and deduplicate raw data. Mart models are Gold: they apply business logic and serve analytics. For materialization: staging models are views or ephemeral (cheap, no storage). Mart models are tables or incremental (materialized for fast queries).'
What They Want to Hear 'A data quality SLA is a measurable commitment: this table will be updated by 8 AM, null rate will be under 1%, row count will be within 20% of the 7-day average. I publish these as data contracts between producers and consumers. If the SLA is violated, the data is held and the on-call is paged.' The key insight: SLAs force you to quantify 'good enough' instead of hand-waving.
What They Want to Hear 'Batch dedup is straightforward: ROW_NUMBER in SQL. Streaming dedup requires stateful processing: maintain a set of seen event IDs in the processor state, skip events already in the set. Bloom filters provide memory-efficient approximate membership testing when the ID set is too large for memory.'
What They Want to Hear 'Four common anti-patterns: too many layers (adding Platinum, Diamond), skipping Bronze (no safety net), putting business logic in Silver (mixing concerns), and identical Silver and Gold (unnecessary duplication). I have seen all four in production.'
What They Want to Hear 'Statistical baselines over static thresholds. I compute the rolling mean and standard deviation over the past 7-30 days. An alert fires when the current value exceeds 2 standard deviations from the mean. This handles daily and weekly seasonality naturally because the baseline adapts.'