A public-company data platform team had every quality check the intermediate tier prescribes. Schema validation, distributional checks, referential integrity, freshness gates, the whole suite. The pages still came in volume. On-call rotations were burning out. Half the alerts were technically correct: a column did shift, a row count did dip, a freshness gate did miss its threshold. None of those alerts represented an actual production problem; they represented threshold drift, calendar effects, and producer behaviors that nobody had told the consumer about. The other half of the alerts were silent failures the suite missed because the assertions had never been written. The team had quality engineering. It did not have quality discipline. The path from one to the other has three components: contracts that name what is committed, observability that names what is happening, and threshold tuning that names what is normal. This lesson covers all three, plus the rollout problem of getting there from a legacy pipeline that does none of them.
Data Contracts and CI Enforcement
Daily Life
Interviews
Author a data contract with schema, guarantees, and evolution policy, and enforce it in producer CI plus pipeline gates.
Lesson 1's advanced tier introduced the pipeline-as-product framing: a pipeline has a contract that names producer, consumer, schema, freshness SLA, quality SLA, backfill policy, and deprecation policy. This section turns that framing into a working mechanism. A data contract is the executable form of the commitment. The producer commits to a shape and a set of guarantees; the consumer relies on them; the contract is checked in CI on every change so violations cannot ship. Without enforcement, contracts are documentation. With enforcement, they are the layer that prevents most of the silent failures the previous lessons spent so much effort detecting. The shift from documentation to enforcement is the central move. Documentation rots. CI does not. A contract that is a YAML file in a wiki page is fiction within six months because the producer evolves and the wiki does not. A contract that is a YAML file in the producer's repo, validated by a CI step, evolves at the speed the producer evolves and never disagrees with reality. Every team that has built a contract program at scale agrees that the enforcement step is the one that changes the failure rate; the document was always achievable.
What Makes A Contract A Contract
Property
Implicit Agreement
Data Contract
Form
Tribal knowledge; Slack threads; an outdated wiki page
A versioned file checked into the producer's repo
Enforcement
Discovered after a consumer breaks
Checked in CI on every producer change
Visibility
Whoever has been around long enough to know
Discoverable by every consumer through a registry or catalog
Versioning
Implicit; whatever was true the last time someone looked
Semver-style; breaking changes require a new major version
Real events conform to the contract at the boundary
Quality gate halts; producer is paged
Consumer CI
Consumer code reads the contract version it depends on
Consumer build fails; consumer pins or upgrades
Registry validation
Contract version is registered and discoverable
Deploy is blocked until registration succeeds
Producer and consumer bind through explicit version pinning. The producer publishes a contract version; the consumer pins to a version it has tested against. A consumer that depends on customer_events 2.3.0 declares that dependency in code and builds against it. When the producer publishes 2.4.0 (additive minor), the consumer's build still passes because 2.4.0 is backward-compatible with 2.3.0. When the producer wants to publish 3.0.0 (breaking), the contract evolution policy requires a 90-day notice and a major version bump; consumers receive notification and have time to migrate. This is exactly the pattern that semver versioning solved for software libraries; data contracts apply it to data.
1
# Consumer code declares its contract dependency explicitly
2
fromcontractsimportcustomer_events
3
4
# Pin to a specific version; build fails if version is removed or unavailable
Producer changes a column type; consumers break in production
Schema rolls forward without consumer awareness
Producers and consumers debate root cause during the incident
Adding a new consumer means archeology to find the actual schema
✓Post-Contract World
Breaking change is rejected at producer CI; never reaches consumers
Schema evolution follows semver; consumers see only versions they pin to
Root cause is named by the contract version; debate is short
New consumers read the contract registry and integrate without archeology
What Contracts Cannot Do Alone
Contracts catch shape violations and explicitly-named guarantee violations. They do not catch business-logic correctness on conforming data. A contract that says amount_usd is a decimal in the range zero to one million does not catch an amount that is the wrong number for the underlying transaction. The quality suite from the intermediate tier remains necessary. Contracts and quality suites are complements: contracts prevent shape failures, quality suites detect distribution-and-content failures. A serious data platform runs both.
Signs that a contract program is mature:
▸Producers declare contracts before consumers integrate, not after
▸Schema evolution follows a documented semver-like policy
▸CI rejects breaking changes; production gates rarely have to
▸Consumer code declares contract dependencies as explicit pins
▸A registry exists where any team can discover what contracts are available
✓Do
Version contracts with semver; additive minor, breaking major, fixed-bug patch
Enforce contracts in producer CI; production gates are the second line of defense
Publish contracts to a registry that consumers can discover programmatically
✗Don't
Treat contracts as documentation; without enforcement they accumulate as fiction
Allow breaking changes in minor versions; consumers will silently break
Couple the contract format to one tool; contracts outlast any tool decision
TIP
Start contracts at the highest-leverage producer-consumer boundary, not at every boundary at once. The first contract teaches the team how to author and enforce; the second through tenth follow the pattern with little marginal cost.
Five Pillars of Observability
Daily Life
Interviews
Apply the five pillars of data observability as a diagnostic framework for incident response and as a coverage map for designing quality programs.
Barr Moses and the Monte Carlo Data team named the five pillars of data observability: freshness, distribution, volume, schema, and lineage. The naming has caught on widely enough that conversations about quality use it as shorthand. The pillars are useful because they are not a checklist; they are a diagnostic framework. When something is wrong with the data, the pillar that detected the symptom narrows the search for the cause. When designing a quality program, the pillars name the gaps that have to be filled before the program is considered observable rather than instrumented alone. The framework predates the pillars under different names. Software observability matured along the same lines: metrics, logs, and traces are a similar three-axis decomposition, where each axis answers a different diagnostic question. The data version is more axes because data has more independent dimensions of failure. Volume can be wrong without distribution being wrong; schema can shift without freshness slipping; lineage can be unknown even when every other pillar is green.
The Five Pillars
FreshnessDistributionVolumeSchemaLineage
Freshness
Is the data current
Time between latest available record and now. Asks whether the pipeline is keeping up. The simplest pillar to measure and the first one consumers notice.
Distribution
Are values within the expected range and shape
Statistical properties of columns: mean, stddev, quantiles, cardinality, category mix. Catches shifts that no individual row violates.
Volume
Is the right amount of data arriving
Row counts compared to historical baselines. Catches dropped partitions, broken filters, exploded joins. Often the first signal of an upstream problem.
Schema
Is the shape of the data what was promised
Columns, types, nullability, accepted values, ranges. The producer-side commitment made executable.
Lineage
Where does this data come from and what reads it
Upstream and downstream relationships at column granularity. Turns 'a number changed' into 'a number changed because this transform changed'.
Pillars As A Diagnostic Framework
When a consumer reports a wrong number, a senior engineer walks the pillars in order and uses each one to either narrow or rule out a class of cause. Freshness rules out 'is the data even recent.' Volume rules out 'did the right amount of data arrive.' Schema rules out 'is the shape correct.' Distribution rules out 'are the values within their normal range.' Lineage answers 'what produced this column and what depends on it.' The walk takes minutes and replaces the unstructured 'try things until something works' debugging that consumes hours when each pillar has to be checked manually.
Symptom
First Pillar To Check
Why
Dashboard shows last week's data on Monday morning
Freshness
Most likely an ingestion stall; rules in or out the simplest cause
Revenue dropped 15 percent overnight
Volume
Sudden numeric drops are usually missing rows, not changed values
Average order amount is up but row count is steady
Distribution
Aggregate change without row count change implies value shift
Pipeline succeeded but downstream join produces nulls
Schema
Type or nullability mismatch is the typical cause of post-success join failures
Two consumers see different numbers from the same source
Lineage
Different consumers may read different downstream tables; lineage exposes the divergence
A pillar is a category. A check is a specific assertion within a pillar. The pillar 'distribution' contains many checks: mean shift, stddev shift, p99 shift, category mix shift, cardinality shift. The framework value of pillars is in coverage, not in implementation. A program that has a hundred checks within four pillars and zero checks in the fifth is a program with a known blind spot.
Lineage: The Pillar Most Programs Skip
Most quality programs cover four pillars well and lineage poorly. The reason is cost. Lineage at table granularity is moderately expensive; lineage at column granularity is expensive; lineage that updates as transforms evolve is expensive to keep current. The payoff is that lineage transforms incident response. A column-level lineage system answers questions like 'what consumers depend on amount_usd in fct_orders' in seconds. Without lineage, the same question takes hours of grep-and-Slack archaeology. Tools like dbt, Dagster, and standalone catalogs (DataHub, OpenMetadata) cover this pillar; the discipline is in adopting and keeping them current.
Reading the diagram backward from a consumer answers 'where does this number come from.' Reading it forward from a source column answers 'who is affected if this column changes.' Both questions show up in incident response and in change reviews. Lineage is the pillar that ties the other four together.
When To Adopt Each Pillar
Maturity Stage
Pillars In Place
What The Team Can Answer
Stage 1: Cheap checks
Volume, freshness
Did the right amount of data arrive on time
Stage 2: Suite
Volume, freshness, schema, distribution
Is the data structurally and statistically as expected
Stage 3: Observable
All five pillars including lineage
What changed, why, and who is affected
Stage 4: Contract-enforced
All five pillars plus contracts in CI
Same as Stage 3, but most failures cannot ship
•Without Lineage
Incident response starts with 'who owns this column'
Change reviews miss downstream consumers
Deprecation requires manual canvassing
Two consumers compute conflicting numbers; cause is hidden
✓With Lineage
Incident response starts with the lineage graph; ownership is metadata
Deprecation walks the graph and notifies every downstream
Conflicting numbers are explained by divergent transforms in the graph
TIP
When a quality program is in place but the team still spends hours diagnosing incidents, the missing investment is almost always lineage. The other four pillars detect; lineage interprets.
The five pillars are a framework for coverage and a diagnostic walk for incidents.
Most programs underinvest in lineage; the cost shows up in incident response time.
Pillars are categories; specific checks live within them. Coverage means hitting all five categories, not running one check.
The pillars are descriptive, not prescriptive. A program with twenty schema checks and zero distribution checks is not 'four-fifths observable'; it is observable on schema and blind on distribution. Coverage is binary per pillar.
Quality SLAs vs Ops SLAs
Daily Life
Interviews
Distinguish operational from quality SLAs and state both as separate commitments with separate measurements and improvements.
An SLA states a commitment. The pipeline-as-product framing from Lesson 1 introduced two SLAs as elements of the contract: freshness SLA and quality SLA. They are commonly conflated. They are different commitments to different things, with different consequences when they fail. A pipeline that meets its operational SLA can fail its quality SLA in green. A pipeline that meets its quality SLA can miss its operational SLA without affecting correctness. The producer who treats both as one number ends up over-promising on one and under-detecting failure of the other. The conflation has visible consequences. Status pages that report a single uptime number describe operational SLA exclusively, leaving consumers with no way to distinguish 'late but correct' from 'on time but wrong'. Incident reviews that do not separate the two end up with action items that improve one without addressing the other. The split SLA is more honest, and honesty in producer commitments is the foundation of the trust that makes data products usable.
The Two SLAs
Property
Operational SLA
Quality SLA
Question answered
Did the data arrive on time
Was the data correct
Typical statement
Pipeline finishes by 6am every day
Row count within 50 to 200 percent of baseline; null rate below 1 percent
Failure mode
Late or missing run
Wrong numbers in a successfully-completed run
Detected by
Orchestrator monitoring; missed schedule alerts
Quality gates inside the pipeline
Consumer impact
Dashboard or model shows yesterday's data
Dashboard or model shows wrong data
Why Conflating Them Hurts
An operational SLA of 'fresh by 6am' tells the consumer when to expect the data. A quality SLA of 'correct row counts by 6am' tells the consumer when to expect the data to be both fresh and right. The two are independent. A pipeline can meet 'fresh by 6am' with a 30 percent row count drop. A pipeline can have a flawless row count and miss the 6am deadline because the warehouse was slow. Consumers who hear 'the team has a 6am SLA' assume both meanings are guaranteed. Producers who state a 6am SLA often mean only operational. The conversation has to specify which one, or both.
Stating Both Explicitly
1
#BOTHSLAsstatedseparatelyINthecontractguarantees:operational_sla:statement:'fct_orders updated by 06:00 Pacific each day'measurement:'orchestrator-reported finish time'target:99.0WINDOW:'rolling 30 days'quality_sla:statement:'all five-pillar gates pass on the run that satisfies the operational SLA'measurement:'count of successful runs with all gates green / total runs'target:99.5WINDOW:'rolling 30 days'combined_sla:statement:'fresh AND correct by 06:00 Pacific'target:98.5
What The Combined SLA Actually Costs
The combined SLA is the multiplication of the two. A 99.0 percent operational SLA and a 99.5 percent quality SLA produce a combined SLA of 98.5 percent at best. Promising both individually at 99.0 percent and treating that as the joint guarantee is mathematically wrong. The cost compounds further when consumer behavior is sensitive to the combined number: a model that retrains daily on potentially-stale or potentially-wrong data needs to be designed to tolerate the combined error rate, not either component alone.
Operational
Quality
Combined (Best Case)
99.0%
99.0%
98.0%
99.5%
99.5%
99.0%
99.9%
99.9%
99.8%
99.99%
99.99%
99.98%
Designing around each SLA requires different investments. The operational SLA is improved by orchestration investments: warm pools, retry policies, redundant scheduling, removing single points of failure in compute. The quality SLA is improved by suite investments: more pillars covered, tighter thresholds tuned against history, contracts that prevent shape failures from shipping. The two require different teams in some organizations and different budgets in most. Treating them as one budget produces under-investment in whichever one is currently considered solved.
Operational SLAQuality SLACombined SLA
Operational SLA
On-time delivery commitment
Pipeline finishes by a stated deadline. Measured by orchestrator finish time. Improved by warm pools, retry budgets, and redundant scheduling.
Quality SLA
Correctness commitment
Five-pillar gates pass on the run. Measured by green-gate run rate. Improved by suite coverage, tuned thresholds, and contracts in CI.
Combined SLA
Fresh AND correct
The multiplication of the two. The honest number to publish on a status page; the geometric impossibility to avoid promising.
•Operational SLA Improvements
Warm pools and pre-provisioned compute
Retry budgets with exponential backoff
Redundant orchestrator instances
Critical-path identification and short-circuiting
✓Quality SLA Improvements
Coverage of all five pillars at every layer boundary
Threshold tuning against historical data
Contracts in CI to prevent shape failures
Lineage to shorten incident response
A real-time fraud detection feature has a tight operational SLA: data fresh within seconds. The quality SLA is also tight, but a failure that produces no result is preferable to a failure that produces wrong results. Operational and quality both matter; correctness wins ties. A monthly close finance pipeline has a relaxed operational SLA but an unforgiving quality SLA: a wrong number in the close requires a refile and a regulatory disclosure. Knowing which dominates for a given consumer is part of the contract.
Concrete operational vs quality tradeoffs in production:
▸Stripe payment events: operational tight (seconds); quality must dominate (no double-charges)
When a consumer reports an SLA breach, ask which SLA: operational or quality. The fix is different for each. Combining them in conversation produces fixes that miss the actual problem.
✓Do
State operational and quality SLAs as separate commitments in the contract
Compute the combined SLA explicitly; do not promise the geometric impossible
Report all three (operational, quality, combined) on the producer status page
✗Don't
Treat 'the pipeline is up' as the only SLA; up-but-wrong is its own failure mode
Promise high quality SLA targets without a five-pillar suite to back them
Allow operational improvements to mask quality regression; budget for both
Operational and quality SLAs are different commitments; combining them under-detects one failure mode.
The combined SLA is the multiplication of the two; promise math that is achievable.
Tuning Thresholds vs History
Daily Life
Interviews
Tune quality thresholds against historical data and annotate known anomalies so the alarm rate matches the team's investigation capacity.
A quality system that fires too often gets ignored. The mechanism is simple. On-call engineers receive twenty pages a week. Three of them are real. The remaining seventeen train the engineer to acknowledge alerts without reading them carefully. The next real page lands in the same Slack channel as a false one and is missed. The pipeline that the team thought was protected is, in operational terms, unprotected, because the protection mechanism has been desensitized by its own noise. The fix is not to remove checks. The fix is to tune the thresholds against historical data so that the alarm rate is low enough that every alarm is read carefully. The same dynamic appears in security operations centers, in airline cockpits, and in hospital telemetry alarms, and in every domain the conclusion is identical: an alert system has a finite signal-to-noise budget, and exceeding the budget destroys the system's value. Quality engineering has not historically thought of itself as alarm-system design, but it is.
Alert Fatigue Is A Quality Failure
Symptom
Underlying Cause
Consequence
On-call ignores quality pages
Most pages are not actionable; threshold is too tight
Real failures missed; consumer trust degrades
Quality dashboard shows constant red
Visualizing every check as critical
The dashboard becomes wallpaper; nobody looks at it
Page rate higher than incident rate
False positives outnumber real signal
Engineers escalate to suppress checks; coverage shrinks
Engineers create silent alert filters
The system has not been tuned; humans are filtering instead
Filtering becomes tribal knowledge; new on-call doesn't have it
Threshold tuning runs the proposed assertion against historical data and counts the alerts that would have fired. A threshold that would have fired three hundred times on the last ninety days of data is not a threshold; it is a constant. A threshold that would have fired zero times is not a threshold; it is a non-check. A useful threshold fires somewhere between two and ten times in a ninety-day window, and each firing corresponds to either a real incident or a known anomaly that can be classified as such. The tuning is empirical, not theoretical.
1
WITHdaily_statsAS(
2
SELECT
3
order_date,
4
AVG(amount_usd)ASdaily_mean
5
FROMfct_orders
6
WHEREorder_dateBETWEENCURRENT_DATE-90
7
ANDCURRENT_DATE-1
8
GROUPBYorder_date
9
),
10
rollingAS(
11
SELECT
12
order_date,
13
daily_mean,
14
AVG(daily_mean)OVER(
15
ORDERBYorder_date
16
ROWSBETWEEN28PRECEDINGAND1PRECEDING
17
)ASrolling_mean,
18
STDDEV(daily_mean)OVER(
19
ORDERBYorder_date
20
ROWSBETWEEN28PRECEDINGAND1PRECEDING
21
)ASrolling_sd
22
FROMdaily_stats
23
)
24
25
SELECT
26
order_date,
27
daily_mean,
28
ROUND(
29
(
30
daily_mean-rolling_mean
31
)/rolling_sd,
32
2
33
)ASz_score
34
FROMrolling
35
WHEREABS(
36
(
37
daily_mean-rolling_mean
38
)/rolling_sd
39
)>=3
40
ORDERBYorder_date
The query produces the dates on which a z >= 3 threshold would have fired. The team reviews each date with an analyst: was something real happening, or was the threshold too tight. Tuning continues until the firing rate matches the rate at which the team can credibly investigate every firing without ignoring any of them.
What 'Tuned' Looks Like
Quality SLA Target
Implied Page Rate
Threshold Tightness
99.9% (one bad day per quarter)
About one page per quarter per check
Loose; only large shifts fire
99.5% (one bad day per month)
About one page per month per check
Moderate; large and persistent shifts fire
99.0% (one bad day per ten days)
About three pages per month per check
Tighter; small persistent shifts fire
95.0% (one bad day per twenty)
Many pages per month per check
Tight; check is approaching alert fatigue
A practical pattern uses two thresholds per check. A warning threshold fires more often, into a channel where humans review during business hours. A blocking threshold fires rarely, into the on-call rotation. The warning catches plausible-but-suspicious shifts; the blocker catches definite-and-actionable failures. The warning channel is allowed to be noisy because it does not interrupt anyone outside business hours; the blocking channel is held to a strict signal-to-noise ratio because every page interrupts an engineer.
•Warning Channel
Reviewed during business hours
Tolerates noise; signal extracted by humans during review
Z-score thresholds in the 2 to 3 range
Includes plausible day-of-week and seasonal anomalies
•Blocking Channel
Pages on-call regardless of time of day
Strict signal-to-noise ratio; tuned to fire rarely
Z-score thresholds in the 4 to 6 range
Excludes anomalies that historical review classified as benign
Many quality false alarms are calendar effects: holidays, end-of-month spikes, marketing campaigns, product launches. A pure z-score against a trailing window catches these as alerts, even though they are predictable. The fix is to enrich the baseline with calendar awareness: same-day-of-week comparisons, holiday flags, campaign annotations. The baseline becomes a model rather than a sliding average. The investment is worth it for high-volume tables where calendar effects produce most of the noise. For low-volume tables, the same investment is over-engineering.
1
WITHbaselineAS(
2
SELECT
3
EXTRACT(DOWFROMorder_date)ASday_of_week,
4
AVG(daily_count)ASdow_mean,
5
STDDEV(daily_count)ASdow_sd
6
FROMfct_orders_daily
7
WHEREorder_dateBETWEENCURRENT_DATE-90
8
ANDCURRENT_DATE-1
9
GROUPBYEXTRACT(DOWFROMorder_date)
10
)
11
12
SELECT
13
EXTRACT(DOWFROMCURRENT_DATE)ASdow_today,
14
(
15
SELECT
16
COUNT(*)
17
FROMfct_orders
18
WHEREorder_date=CURRENT_DATE
19
)AStoday_count,
20
ROUND(baseline.dow_mean,0)ASdow_mean,
21
ROUND(
22
(
23
(
24
SELECT
25
COUNT(*)
26
FROMfct_orders
27
WHEREorder_date=CURRENT_DATE
28
)-baseline.dow_mean
29
)/baseline.dow_sd,
30
2
31
)ASz_score
32
FROMbaseline
33
WHEREday_of_week=EXTRACT(DOWFROMCURRENT_DATE)
Some anomalies are real and not bugs. Black Friday produces a row count spike that the threshold should know about. A planned product launch produces a feature distribution shift that the threshold should know about. Annotating these in advance prevents the threshold from firing on them. The annotation lives next to the threshold definition and is reviewed in the same PR cycle. Without annotations, the team adds suppression rules ad-hoc during the incident, and those rules outlive the event they were created for.
1
#CalendarannotationsconsultedBYthethresholdengineknown_anomalies:-DATE:'2026-11-27'TABLE:fct_ordersmetric:row_countexpected_z_shift:'+8 to +15'reason:'Black Friday'-date_range:'2026-12-20 to 2026-12-26'TABLE:fct_ordersmetric:amount_usd_meanexpected_z_shift:'+2 to +5'reason:'Holiday gifting; higher AOV'-date_range:'2026-04-15 to 2026-04-22'TABLE:fct_customer_eventsmetric:event_type_signup_pctexpected_z_shift:'+3 to +6'reason:'Spring marketing launch'
Symptoms that thresholds need re-tuning:
▸Engineers create Slack mute rules for specific quality alerts
▸On-call routinely acknowledges pages with 'expected; ignoring'
▸The same check fires on the same day of the week every week
▸New on-call rotations report being overwhelmed by quality alerts
▸A real incident is missed because the page was indistinguishable from noise
✓Do
Tune every threshold against at least 90 days of historical data before turning it on
Use two-tier checks: warn loosely and block strictly
Annotate known anomalies in version control next to the thresholds
✗Don't
Treat alert volume as a measure of quality coverage; the right measure is incidents caught
Tighten thresholds reactively after an incident; tighten as a deliberate review
Allow ad-hoc suppression rules to outlive their original cause
TIP
The cost of a false alarm is the next real alarm that gets ignored. Treat threshold tuning as part of building the check, not as a follow-up task.
Contracts on a Legacy Pipeline
Daily Life
Interviews
Roll out data contracts on a legacy pipeline by documenting the current state, inventorying consumers, observing before enforcing, and migrating behind versioned changes.
Greenfield contracts are easy. Contracts on a legacy pipeline that has run for four years and has dozens of unknown consumers are hard. The mistake most teams make is treating the rollout as a one-shot migration: write the contract, declare the producer compliant, declare consumers responsible for catching up. The mistake produces breakage and erodes the credibility of the contract program. The disciplined rollout treats existing consumer behavior as the starting contract, evolves toward the desired contract over time, and uses the same shape that semver software libraries have used for decades. The exercise below walks through the rollout for a customer events stream that is the lifeblood of three downstream consumers and an unknown number of analyst queries. The legacy character of the pipeline is what makes the rollout interesting. A greenfield rollout has the luxury of starting from a clean schema and an enumerable consumer set. A legacy rollout has neither. The discipline below is what compensates for the missing luxury, and the principles transfer to any production system that has been running long enough that the original authors have moved on and the original assumptions have drifted.
The Starting State
What the team inherits:
▸Producer is a four-year-old Kafka topic with no formal schema; events are JSON
▸Three named consumers: leadership dashboard, churn model, billing aggregator
▸Unknown number of analyst queries reading from the curated layer
▸No quality gates; one wrong-number incident every six to eight weeks
▸Schema has drifted: producers have added and renamed fields without coordination
Step 1 is to document the current contract. The first contract is not the contract the team wants; it is the contract the producer is currently delivering, including the drift. The team writes it by inspecting recent production data: every field that has appeared in the last 30 days, every type variation seen, every value range observed. The result is messy and accurate. The contract version is 1.0.0, and it captures reality, not aspiration.
UNION[TIMESTAMP,string],nullable:FALSE,notes:'producers send both ISO-8601 strings and epoch ms'}-{name:amount,type:
3
UNION[DECIMAL,string],nullable:TRUE,notes:'string for older clients; decimal for newer'}guarantees:freshness:'best effort, typically <= 30 minutes'volume:'no enforced bound; observed 1.5 to 4M per day'consumers_known:-leadership_dashboard-churn_model-billing_aggregator
Step 2 is to inventory every consumer. The three named consumers are easy. The unknown analyst queries are not. Lineage tooling helps: parsing recent query logs against the table catches most analyst queries; manual outreach catches the rest. The output is a list of every reader, ranked by frequency, with an owner per consumer. Step 3 adds quality gates in observation-only mode. The gates run, log results, and do not block. After two to four weeks of observation, the team has data on which gates would have fired and tunes against it. Once tuned, gates switch from observation to enforcing. Skipping the observation phase is the most common rollout mistake.
Step 4 plans the cleanups as versioned changes. Legacy problems (union types on ts and amount, unwhitelisted event_types, ambiguous nulls) are each a versioned change. Renaming ts to event_timestamp with strict ISO-8601 is a breaking change; it ships in version 2.0.0 with a 90-day notice. Whitelisting accepted_values for event_type is backward-compatible if the producer commits to not emitting new values without a contract update; it ships as minor 1.x. Each change has a migration plan naming affected consumers and their actions. Cleanups happen one at a time, not as a single rewrite.
Change
Version
Type
Consumer Action
Whitelist accepted_values for event_type
1.1.0
Backward-compatible (additive guarantee)
None; consumers can adopt the tighter contract or stay on 1.0
Tighten customer_id null rate to <= 1 percent
1.2.0
Backward-compatible (tighter guarantee)
None; consumers benefit
Rename ts to event_timestamp; strict ISO-8601
2.0.0
Breaking
Pin to 1.x or migrate within 90 days
Rename amount to amount_usd; strict decimal
2.0.0
Breaking
Pin to 1.x or migrate within 90 days
Drop legacy event_types (page_view, click, error)
3.0.0
Breaking
Pin to 2.x or migrate within 180 days
Step 5 migrates consumers behind the versions. Each consumer migrates to the next major version on its own schedule within the announced notice period. The producer maintains both old and new versions during the transition window; dual-publishing is more expensive than single-publishing, and that maintenance burden falls on the producer. The benefit is that no consumer breaks in production; each migration is a deliberate code change with tests, not a surprise in an incident channel. Some consumers migrate in a week; others take the full notice period. Both are acceptable; a unilateral cutover is not. The producer tracks migration progress with a simple consumer-version query against the contract registry.
Step 6 locks in the new state. Once consumers are migrated and the legacy version is deprecated, the producer flips the gates from observation-only to enforcing on the new contract. CI now rejects producer changes that violate the contract. Schema drift becomes impossible; incidents that used to come every six to eight weeks become rare. The rollout is complete when the producer can commit to the contract as a CI-enforced guarantee, not documentation alone.
•Before The Rollout
Wrong-number incident every six to eight weeks
Three named consumers; unknown analyst surface
Schema drifts continuously; producers add fields without notice
Quality is best-effort; no SLA possible
Rolling back a producer change requires consumer-by-consumer triage
✓After The Rollout
Wrong-number incidents are rare; most are caught in CI
Every consumer is registered; lineage covers analyst queries
Schema evolves through versioned changes with notice
Operational and quality SLAs are stated and tracked
Producer can roll back any version because all consumers are pinned
The rollout principles, in order:
▸Document the legacy contract as it actually is, not as the team wishes
▸Inventory every consumer; surprise consumers cause migration failures
▸Add gates in observation-only mode and tune against the observed data
▸Plan cleanups as versioned changes; backward-compatible as minor, breaking as major
▸Migrate consumers behind versions on their own schedule within the notice window
▸Flip gates to enforcing only after the migration is complete
TIP
A legacy rollout that takes nine months and breaks zero consumers is a successful rollout. A legacy rollout that takes three months and breaks four consumers undoes its own credibility. Speed is not the goal; trust is.
✓Do
Treat the first contract as a documentation of reality, not aspiration
Use semver explicitly; minor for additive, major for breaking, with notice
Maintain old and new versions in parallel during the migration window
✗Don't
Cut over consumers unilaterally; the rollout is a negotiation, not a mandate
Skip observation mode; thresholds need real data before they can be enforced
Treat the rollout as one project; treat each version bump as its own change
Contracts and quality become a system only when the producer can roll back any version because every consumer is pinned. The concise statement of the rollout principle: a legacy rollout earns its trust by treating consumers as parties to a negotiation, not as inventory to be migrated, structured by versions and notice periods. The trust accumulates over the months in which no consumer is broken by surprise.
❯❯❯PUTTING IT ALL TOGETHER
> A staff data engineer joins a public-company data platform team. The platform has every quality check the intermediate tier prescribes. On-call rotations are burning out from alert fatigue. Half of incidents are caught by gates and half are still discovered by consumers asking 'why does this number look wrong'. The engineer is asked to turn the quality program into something the team trusts and the consumers can rely on.
Adopt data contracts as producer-side commitments enforced in CI. The contract names schema, freshness, volume, uniqueness, and evolution policy. Producer CI rejects breaking changes; pipeline gates are the second line of defense. (Builds on Lesson 1's pipeline-as-product framing.)
Cover all five pillars (freshness, distribution, volume, schema, lineage) and treat them as a diagnostic framework, not a checklist. Lineage is the pillar most programs underinvest in; the cost of skipping it is incident response time.
State operational and quality SLAs separately. The combined SLA is the multiplication of the two. Designing for one without the other under-detects the other failure mode. (Connects to Lesson 4's orchestration SLAs and Lesson 6's failure handling.)
Tune every threshold against historical data. Use two-tier checks: warn loosely, block strictly. Annotate known anomalies (holidays, launches) so the threshold engine is calendar-aware. The cost of a false alarm is the next real alarm that gets ignored.
Roll out contracts on the legacy pipeline by documenting reality first, inventorying every consumer, observing before enforcing, and migrating consumers behind versioned changes with explicit notice periods. A nine-month rollout that breaks zero consumers is the goal. (Builds on the layered architecture and decoupling concepts from Lesson 1, the four cheap checks from this lesson's beginner tier, and the five pillars from this lesson's intermediate tier.)
Combine all of the above into a quality discipline rather than a quality engineering effort. Engineering catches failures; discipline prevents them. The shift is the difference between a team that manages incidents and a team that prevents them.
KEY TAKEAWAYS
Contracts make quality enforceable: producer commits, consumer relies, CI rejects violations before they ship. Pipeline gates are the second line of defense, not the first.
The five pillars are coverage, not a checklist: freshness, distribution, volume, schema, lineage. A program with four pillars and zero lineage has a known blind spot.
Operational and quality SLAs are separate commitments: the combined SLA is their multiplication. Promise the math that is achievable, not the geometric impossible.
Threshold tuning is part of authoring a check: use historical data, two-tier warn-and-block, and calendar annotations. The cost of false alarms is the next real alarm that gets ignored.
Legacy rollouts are negotiations, not migrations: document reality, inventory consumers, observe before enforcing, version every change with notice. Speed is not the goal; trust is.
Contracts make quality enforceable; observability makes it diagnosable; tuning makes it trusted
Category
Pipeline Architecture
Difficulty
advanced
Duration
38 minutes
Challenges
0 hands-on challenges
Topics covered: Data Contracts and CI Enforcement, Five Pillars of Observability, Quality SLAs vs Ops SLAs, Tuning Thresholds vs History, Contracts on a Legacy Pipeline
Lesson 1's advanced tier introduced the pipeline-as-product framing: a pipeline has a contract that names producer, consumer, schema, freshness SLA, quality SLA, backfill policy, and deprecation policy. This section turns that framing into a working mechanism. A data contract is the executable form of the commitment. The producer commits to a shape and a set of guarantees; the consumer relies on them; the contract is checked in CI on every change so violations cannot ship. Without enforcement, c
Barr Moses and the Monte Carlo Data team named the five pillars of data observability: freshness, distribution, volume, schema, and lineage. The naming has caught on widely enough that conversations about quality use it as shorthand. The pillars are useful because they are not a checklist; they are a diagnostic framework. When something is wrong with the data, the pillar that detected the symptom narrows the search for the cause. When designing a quality program, the pillars name the gaps that h
An SLA states a commitment. The pipeline-as-product framing from Lesson 1 introduced two SLAs as elements of the contract: freshness SLA and quality SLA. They are commonly conflated. They are different commitments to different things, with different consequences when they fail. A pipeline that meets its operational SLA can fail its quality SLA in green. A pipeline that meets its quality SLA can miss its operational SLA without affecting correctness. The producer who treats both as one number end
A quality system that fires too often gets ignored. The mechanism is simple. On-call engineers receive twenty pages a week. Three of them are real. The remaining seventeen train the engineer to acknowledge alerts without reading them carefully. The next real page lands in the same Slack channel as a false one and is missed. The pipeline that the team thought was protected is, in operational terms, unprotected, because the protection mechanism has been desensitized by its own noise. The fix is no
Greenfield contracts are easy. Contracts on a legacy pipeline that has run for four years and has dozens of unknown consumers are hard. The mistake most teams make is treating the rollout as a one-shot migration: write the contract, declare the producer compliant, declare consumers responsible for catching up. The mistake produces breakage and erodes the credibility of the contract program. The disciplined rollout treats existing consumer behavior as the starting contract, evolves toward the des