Pipeline Operations: Intermediate

A consumer goods company at 400 engineers had a Snowflake bill that grew 38% quarter over quarter for four straight quarters. The CFO asked the head of data which pipelines accounted for the increase. The head of data could not answer. There were no query tags, no per-pipeline cost attribution, no recurring cost review. The team spent three weeks tagging existing queries by pipeline and discovered that one rebuild of a fact table was scanning 14TB every hour, contributing roughly $42,000 per month to the bill. The rebuild had been written eighteen months earlier as a quick fix and had quietly outgrown its origin. Cost attribution did not save the $42,000 alone; it made the conversation about cost possible. The same is true of lineage, of CI/CD, and of the five pillars of observability. None of them is exotic. All of them are the difference between a pipeline that operates as a product and one that operates as a black box. This lesson is about the operational practices that scale a single pipeline into a fleet.

The Five Pillars of Observability

Daily Life
Interviews

Apply the five pillars of observability and audit a live pipeline against them.

The five pillars framework, popularized by Monte Carlo and Barr Moses, names the kinds of signal a mature data observability practice tracks. The pillars are freshness, volume, schema, distribution, and lineage. They are not a checklist of monitors. They are a vocabulary for naming where the eyes and the gaps are. A pipeline well covered on freshness and volume but blind on distribution will fail in a particular family of ways; a pipeline blind on lineage will fail differently. The framework lets the operations conversation be specific. The shift from three day-one monitors to a five-pillar framework is the same shift that the application observability community made when it moved from 'is the server up' to the three-pillar logs/metrics/traces vocabulary. The earlier framing is not wrong; it is undifferentiated. The later framing names what kind of failure each signal catches and lets investment be distributed across the kinds rather than piled onto whichever signal happens to be cheapest to add.

The Five Pillars at a Glance

FreshnessVolumeSchemaDistributionLineage
Freshness
Is the data current
Time since last update. Compared to an SLA. Most common pillar to monitor and most common to fail. 'fct_orders last updated 2 hours ago, SLA is 1 hour.'
Volume
Is the row count right
Total rows or rows per partition. Compared to a baseline (trailing average) or a fixed range. Catches empty extracts, runaway joins, silent filters.
Schema
Are the columns right
Column presence, types, nullability. Compared to a registered schema or a previous snapshot. Catches upstream schema drift before downstream consumers fail.
Distribution
Do the values look right
Statistics on column values: null rate, mean, p95, top-k for categorical. Compared to a baseline. Catches semantic drift the other pillars miss.
Lineage
What depends on what
The graph of which tables and pipelines feed which others. Not a monitor itself; the substrate that lets the others be acted on.

Why Five and Not Three

The day-one monitors from the beginner tier (did it run, did it succeed, was the output right) cover freshness and volume. Schema, distribution, and lineage are the three additions that turn a single-pipeline view into a fleet-level view. Schema catches structural drift. Distribution catches semantic drift. Lineage gives every other signal a context: a row-count drop on a leaf table affects nobody downstream, while the same drop on a foundational table affects forty consumers. Without lineage, every alert looks the same; with lineage, the response can be prioritized.
PillarFailure Mode It CatchesTools That Implement It
FreshnessPipeline silently stops; data is staledbt source freshness, Monte Carlo, Soda Core, custom SQL
VolumeEmpty extract, runaway join, silent filterdbt tests (row count), Great Expectations, Monte Carlo
SchemaColumn added/dropped/retyped upstreamSchema registry, dbt contracts, Monte Carlo schema monitor
DistributionMean shift, null rate spike, new categorical valueGreat Expectations, Monte Carlo, custom SQL with statistical tests
LineageCannot predict who is affected by an upstream changedbt lineage graph, OpenLineage, DataHub, Atlan

Each Pillar Has a Cost Profile

Freshness checks are cheap: they read one row of metadata. Volume checks are cheap: a COUNT(*) per partition. Schema checks are cheap: a metadata read against the information schema. Distribution checks are expensive: computing a null rate across 200 columns over a billion-row table costs real warehouse credits. Lineage is a one-time investment in a metadata graph and then ongoing maintenance. Investment in observability is therefore staged: freshness and volume first, schema next, distribution selectively on the columns where it matters, lineage as the substrate.
Where each pillar earns its cost:
  • Freshness: every pipeline; the cheapest and most universally useful signal
  • Volume: every pipeline; catches the second-most-common silent failure
  • Schema: every pipeline that crosses a system boundary or has external producers
  • Distribution: pipelines feeding ML features and customer-facing aggregates; selective everywhere else
  • Lineage: organization-wide; pays back when blast radius questions get asked

A Pillar Audit on a Live Pipeline

1

The query above takes seconds to run and catches four classes of failure. It is the single highest-leverage thing to add to a pipeline that has none of the pillars covered.

Three-Pillar Coverage
  • Detects: missed runs, empty outputs, late deliveries
  • Misses: column added upstream that breaks parsers
  • Misses: a metric column drifted 3x without changing row count
  • Misses: blast radius of an upstream change
Five-Pillar Coverage
  • Detects everything the three-pillar set catches
  • Detects schema drift before downstream parsing failures
  • Detects distributional shifts without row-count change
  • Maps the impact of any change through the lineage graph
TIP
When auditing an existing pipeline, score it from zero to five on the pillars covered. Most production pipelines score two or three. Investment in the missing pillars is usually higher leverage than investment in shaving runtime.
check
The five pillars are vocabulary, not a checklist; coverage is staged by cost and value.
alert
Distribution checks catch failures the other four miss but cost more to run.
query
Lineage is a substrate for the other pillars: it converts an alert into a prioritized response.

Lineage and Blast Radius

Daily Life
Interviews

Read forward and backward lineage to predict blast radius and to root-cause data incidents.

Lineage is the graph of which datasets depend on which others. Read it forward and it answers 'who consumes this table.' Read it backward and it answers 'what produced this column.' Both directions matter operationally. Without lineage, an engineer changing a column has no way to know who breaks; without lineage, an engineer debugging a wrong number has no way to know which pipeline to inspect first. Lineage is the difference between a five-minute fix and a five-hour fix.

Forward Lineage and Blast Radius

Forward lineage answers the question 'if this changes, what else changes.' A column rename in fct_orders looks small until forward lineage shows that 17 dbt models, 4 dashboards, and 2 ML feature definitions read from it. The blast radius is the size of the change in consumer-facing terms. The same rename has a different blast radius depending on where in the graph it sits. A leaf table has zero blast radius. A foundational dimension table has the entire downstream graph as its blast radius. Engineers who have lineage make changes calibrated to blast radius; engineers who do not have lineage either avoid all changes or make changes that break consumers.

Backward Lineage and Root Cause

Backward lineage answers 'what produced this.' A revenue dashboard shows the wrong number. The dashboard reads from a serving mart. The mart reads from three curated tables. Each curated table reads from raw extracts. Each raw extract reads from a source system. Without backward lineage, the debugger walks this chain by hand, reading code in each repository. With backward lineage, the chain is a graph query: 'show all sources that feed dashboard_revenue,' returning fifteen rows in a second.
1# A dbt manifest snippet showing parsed lineage models : fct_orders : depends_on : - source.app.orders - ref.dim_customer - ref.dim_product dim_customer : depends_on : - source.app.customers - source.crm.salesforce_accounts serving.revenue_dashboard_mart : depends_on : - ref.fct_orders - ref.dim_customer # Forward lineage of source.app.orders, computed
2FROM this graph : # source.app.orders -> fct_orders -> serving.revenue_dashboard_mart -> dashboard_revenue

Where Lineage Comes From

Source of LineageHow It Is CapturedStrengths and Limits
dbt manifest.jsonref() and source() functions parsed at compile timeFree with dbt; covers the SQL transform layer; misses extracts and downstream BI
Query log parsingParse SELECTs against the warehouse; map readers to tablesCaptures actual usage including BI tools; coverage limited to one warehouse
OpenLineage eventsPipelines emit lineage events at run timeVendor-neutral standard; requires instrumentation; covers full pipeline
Catalog productsDedicated tools (DataHub, Atlan, Alation) integrate multiple sourcesMost complete coverage; cost; ongoing maintenance to keep current

Column-Level vs Table-Level

Table-level lineage says 'fct_orders depends on raw.orders.' Column-level lineage says 'fct_orders.amount_cents is derived from raw.orders.gross_amount minus raw.orders.refund_amount.' The first is enough for blast radius questions like 'who reads this table.' The second is needed for impact analysis on a column-specific change like 'this column's definition is changing from gross to net.' Most teams operate with table-level lineage and add column-level coverage only for the most-consumed tables. Building column-level lineage for everything is rarely worth the cost.
Operational uses of lineage:
  • Predicting blast radius before deploying a schema change
  • Routing alerts to the owners of the consumers most affected, not to a generic channel
  • Prioritizing observability investment: more pillars on tables with more consumers
  • Identifying orphan tables: tables nothing reads from, candidates for deprecation

An Incident Walked Through Lineage

Consider a Tuesday morning incident: the executive revenue dashboard shows a value that contradicts a finance team report. The data engineer on call opens the dashboard's lineage. The dashboard reads serving.revenue_dashboard_mart, which reads fct_orders, which depends on raw.orders and dim_customer. Backward lineage shortens the search to four candidates. A freshness pillar check on the four shows that fct_orders has not updated since 2am, while the dashboard expected an update at 6am. The cause is a failed run of fct_orders. Without lineage, the same investigation starts with 'where does this number come from' and burns thirty minutes before reaching the same point.
1
Without Lineage
  • Engineers either avoid all changes or break consumers
  • Debugging a wrong number starts from 'whose pipeline is this'
  • Deprecation requires asking around to find consumers
  • Alerts are routed by table, not by impact
With Lineage
  • Changes are calibrated to known blast radius
  • Debugging starts with the lineage of the affected dashboard
  • Deprecation candidates are identifiable by query: tables nothing reads
  • Alerts can be prioritized by the count and importance of downstream consumers
TIP
Adopt dbt manifest-based lineage first; it is free, accurate within the transform layer, and answers most questions. Catalog products earn their cost only when the lineage graph spans systems dbt does not see.

Cost Attribution

Daily Life
Interviews

Tag warehouse queries with pipeline metadata and read a cost-by-pipeline rollup to identify optimization candidates.

Most data teams do not know what their pipelines cost until somebody asks. The bill arrives as a single number from Snowflake or BigQuery or Databricks; it does not break down by pipeline. Without attribution, the cost conversation is impossible: nobody can say which pipelines should be optimized, which ones can be retired, or which ones are growing fastest. The fix is query tagging, which threads a pipeline identifier through every query the warehouse runs. The pattern is universal across cloud warehouses, with small mechanical differences: a Snowflake QUERY_TAG, a BigQuery job label, a Databricks tag, a Redshift query group. The point is not the field name but the discipline of carrying a stable identifier through every query and rolling it up after the fact.

Query Tags in Snowflake, Tags in BigQuery, Labels in Databricks

WarehouseTagging MechanismWhere It Shows Up
SnowflakeQUERY_TAG session parameter; key-value JSONQUERY_HISTORY view; Snowsight cost dashboards
BigQueryJob labels; key-value pairs on every jobINFORMATION_SCHEMA.JOBS_BY_PROJECT; billing export
DatabricksCluster tags and SQL warehouse tags; query tags via JDBCUsage tables; budget reports
RedshiftQuery labels via SET query_groupSTL_QUERY system table; CloudWatch metrics

What to Tag

The minimum tag set for a pipeline is the pipeline identifier and the run identifier. The pipeline identifier groups queries by what they belong to (orders_daily, customer_360, ml_features). The run identifier separates today's run from yesterday's. A second useful field is the team, which lets cost roll up to organizational owners for chargeback or showback. Tagging by user is rarely useful: it answers 'who ran this query' for ad-hoc work but not for scheduled pipelines, where the user is always a service account.
1ALTER SESSION SET QUERY_TAG = '{"pipeline":"orders_daily","run_id":"r_20260425_0200","team":"data-platform"}' ;
2
3
4SELECT *
5FROM raw.orders
6WHERE order_date = '2026-04-25' ; INSERT INTO fct_orders
7SELECT
8 ...
9FROM raw.orders...;
10
11
12SELECT
13 TRY_PARSE_JSON(QUERY_TAG) : pipeline :: TEXT AS pipeline,
14 SUM(CREDITS_USED_CLOUD_SERVICES + COMPUTE_CREDITS) AS total_credits,
15 COUNT(*) AS queries,
16 AVG(EXECUTION_TIME) / 1000 AS avg_duration_s
17FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
18WHERE START_TIME > DATEADD('day', - 7, CURRENT_TIMESTAMP) AND QUERY_TAG IS NOT NULL
19GROUP BY 1
20ORDER BY total_credits DESC ;

From Credits to Dollars

Warehouse credits and dollars are not the same. A Snowflake credit is roughly $2 to $4 depending on the contract; a BigQuery slot-hour is priced separately from on-demand. The transformation from a credit-per-pipeline number to a dollar-per-pipeline number is a contract-rate multiplication, often pulled from the billing export. The dollar number is the one the business cares about; the credit number is the one engineers debug against. Both belong on the cost dashboard.

What Cost Attribution Reveals

1

In the example above, experimental_finops_test ran seven queries in seven days and consumed more credits than every other pipeline combined. The pattern is universal: the first time cost attribution gets enabled, one or two pipelines turn out to be dominating spend, often pipelines nobody had noticed.

What Cost Attribution Does Not Solve

Attribution is a measurement, not an optimization. A pipeline tagged with its name and shown to cost $42,000 a month is a problem statement, not a fix. The fix requires changing the query, the warehouse size, the materialization, or the cadence. Attribution turns 'spend went up' into 'this specific pipeline drove the increase,' which is enough to start the optimization conversation. Teams that stop at attribution and never act are common; the dashboard is reassuring without changing anything.
Reading a cost-by-pipeline dashboard:
  • The top three pipelines almost always account for over half of spend
  • The fastest-growing pipeline is the one to scrutinize first, not the largest
  • A pipeline whose cost-per-query is 100x the median is a single bloated query in disguise
  • Untagged spend (queries with no QUERY_TAG) is a process gap; chase it down
Without Cost Attribution
  • Spend is one number; nobody can answer 'which pipeline'
  • Cost conversations stall on vague 'needs more investigation' replies
  • New pipelines deploy without a cost expectation
  • Cost spikes are noticed only when they hit the budget
With Cost Attribution
  • Spend rolls up by pipeline and team; trends are visible
  • Cost conversations identify specific pipelines for optimization
  • New pipelines come with a daily cost number from day one
  • Cost spikes alert to the team that owns the pipeline that drove them
Do
  • Set QUERY_TAG at the start of every pipeline session, not on individual queries
  • Include pipeline_id, run_id, and team in the tag
  • Surface untagged spend on the cost dashboard so the gap is visible
Don't
  • Tag with high-cardinality fields like user_id or full timestamps
  • Stop at the credit number; convert to dollars for business audiences
  • Treat the cost dashboard as the deliverable; the deliverable is the optimization that follows
TIP
Make untagged spend a reviewed metric. The first cost optimization is usually closing the tagging gap, because untagged queries are the ones that are easiest to lose.

CI/CD for Pipelines

Daily Life
Interviews

Design a pipeline CI strategy that tests at four layers and uses slim CI to stay fast enough to be habitual.

Application engineers ship changes through CI/CD: a pull request runs unit tests, integration tests, and a deploy step. Pipeline changes are different in two ways. First, the data is part of the test surface; a transform is correct only if it produces the right output on real-shaped input. Second, the pipeline operates on data the test environment may not have. Both differences shape what CI for pipelines looks like, and both are misunderstood by teams that try to import application CI patterns wholesale. The early signal that a CI strategy is mismatched is the engineers' habit of running pipelines locally before pushing, even when CI exists. They are not lazy or untrusting; they are working around a CI shape that does not catch what production is going to catch. The fix is not to demand that they use CI; the fix is to make CI accurate enough that the local-pipeline habit dies on its own.

What Pipeline CI Tests

LayerWhat Is TestedCommon Tools
CodeSyntax, linting, type checks; SQL parses and refs resolveruff, mypy, sqlfluff, dbt parse
LogicTransforms produce expected output on canned inputdbt unit tests (1.8+), pytest, dbt seeds + dbt build
Schema and contractsOutput schema matches the contract; dbt model contracts enforceddbt contracts, dbt-checkpoint, custom CI scripts
IntegrationPipeline runs end-to-end on a sample dataset in a dev environmentdbt slim CI, Dagster ephemeral environments, GitHub Actions runners
Data qualityOutput passes the same quality tests production runsdbt tests, Great Expectations suites, Soda checks

Sample Data and the Dev Warehouse

Pipeline tests need representative data. Two patterns dominate. The first is sample data: a small canned dataset committed to the repo or generated synthetically, used for unit tests on individual transforms. The second is a dev warehouse: a real warehouse environment with a copy or subset of production data, used for integration tests. Sample data is fast and free but tests against shapes the engineer wrote, not shapes production produces. Dev warehouse is slower and costs money but tests against the real shapes. Mature CI uses both: sample data for the tight inner loop, dev warehouse for the integration layer.

dbt Slim CI

dbt's slim CI pattern is the most widely adopted CI optimization for SQL pipelines. Instead of building every model in the project on every PR, slim CI builds only the models that changed and their downstream descendants, tested against a recent production manifest. The result is a CI run that finishes in minutes instead of hours, while still catching the changes that matter. The pattern uses dbt's state:modified+ selector and a manifest from the last successful production run.
1# A typical GitHub Actions workflow for dbt slim CI
2name: dbt slim CI
3on: [pull_request]
4
5jobs:
6 build:
7 runs-on: ubuntu-latest
8 steps:
9 - uses: actions/checkout@v4
10 - name: Install dbt
11 run: pip install dbt-snowflake==1.8.0
12 - name: Download production manifest
13 run: aws s3 cp s3://dbt-state/prod/manifest.json ./prod-manifest.json
14 - name: Run modified models and downstream descendants
15 env:
16 SNOWFLAKE_USER: ${{ secrets.CI_USER }}
17 SNOWFLAKE_DB: CI_DB
18 SNOWFLAKE_SCHEMA: ci_pr_${{ github.event.number }}
19 run: |
20 dbt build \
21 --select state:modified+ \
22 --state ./prod-manifest \
23 --target ci
24 - name: Cleanup
25 if: always()
26 run: dbt run-operation drop_pr_schema --args 'schema: ci_pr_${{ github.event.number }}'

What Pipeline CI Does Not Catch

CI runs against a sample or a recent slice. It cannot catch problems that only appear at full scale: query plans that work on a million rows and explode on a billion, distributional shifts that only show up across a quarter of data, race conditions in parallel writes, cost regressions that compound over time. CI is necessary and not sufficient. Production observability and a structured rollout (canary, gradual deploy, monitored bake-in) close the gap.
What a useful pipeline CI catches:
  • Syntax errors and broken refs (parse-time)
  • Logic errors against representative input (unit tests)
  • Schema contract violations (dbt contracts, dbt-checkpoint)
  • Quality test regressions on the modified models (slim CI + dbt tests)
  • Integration failures against a dev warehouse with sample data
Sample dataDev warehouseSlim CI manifest
Sample data
Canned input, fast iteration
Tens to thousands of rows committed to the repo. Tests transforms in seconds. Misses production-shaped edge cases that live in the long tail.
Dev warehouse
Real semantics, real cost
A CI schema in a real warehouse, populated from production samples. Catches integration bugs that mocks miss. Costs warehouse credits per PR; tear down after merge.
Slim CI manifest
Build only the diff
dbt --select state:modified+ against a recent prod manifest. Cuts CI time from hours to minutes by skipping unchanged models. The default modern dbt CI shape.

From CI to CD

Continuous deployment for pipelines is more conservative than for applications. An application rollback affects users for a few seconds. A pipeline rollback may have to revert a write that produced incorrect data, which is a much harder operation. The standard pattern is automated deploys to staging on merge, then a manual or automated promotion to production once observability has confirmed the deploy is healthy. Some teams do this with a feature flag at the pipeline level: the new code runs alongside the old code, both writing to separate tables, and the active table is switched only after validation.
App CI Imported Wholesale
  • Builds every model on every PR (slow, expensive)
  • Mocks all warehouse interactions (catches none of the real bugs)
  • Treats data as a test fixture, not a test surface
  • Runs unit tests in seconds; misses every integration issue
Pipeline-Aware CI
  • Slim CI: only modified models plus descendants, against recent prod state
  • Dev warehouse with sample data; real warehouse semantics
  • Schema contracts checked in CI; data quality tests run on the diff
  • Faster on the inner loop, more thorough on the integration loop
check
Pipeline CI tests at four layers: code, logic, schema/contracts, integration.
query
Slim CI cuts dbt CI time from hours to minutes by building only the diff.
alert
CI cannot catch full-scale problems; production observability completes the safety net.
TIP
Track CI duration as a first-class metric. Pipeline CI that takes longer than 15 minutes silently encourages engineers to skip it. The slim CI pattern exists to keep CI fast enough to be habitual.

Instrumenting a Pipeline E2E

Daily Life
Interviews

Sequence the operational instrumentation of an existing pipeline so each step delivers value before the next.

Take a real pipeline and walk it through the operational instrumentation. The pipeline is fct_orders, a daily aggregation that pulls from a Postgres replica, transforms in Snowflake via dbt, and feeds a Looker dashboard, a daily revenue email, and an ML feature store. The pipeline runs cleanly today. It has no observability beyond the orchestrator's run status and no cost visibility. The exercise is to bring it to a level-three operational state in a sequence that delivers value at each step.

Step 1: Pillar Coverage on the Output

Start at the output table, fct_orders, because that is the layer the consumers see. Add freshness (last update timestamp must be within four hours), volume (row count for today's partition between 80% and 120% of the trailing seven-day average), and schema (column presence and types match the registered schema). Distribution checks come next, on the most-consumed columns: amount_cents null rate below 0.1%, status taking only the expected categorical values. These checks live in dbt tests on the model and run as part of the production run, with results emitted as metrics.

Step 2: Lineage Visibility

dbt already produces a manifest with lineage on the transform layer. Push the manifest to a shared location and configure the BI tool (Looker, Tableau) to publish its query history. Together they map the graph from raw source to dashboard. The graph answers the blast-radius question and routes future alerts to the right consumer owners. Layering OpenLineage events on top of the dbt lineage adds the runtime view, but the static manifest is the right starting point and is free.

Step 3: Cost Tags

Set a Snowflake QUERY_TAG at the start of every pipeline session: pipeline_id, run_id, team. Backfill the cost rollup query against the last 30 days of QUERY_HISTORY to establish a baseline. The first cost dashboard is one query that returns credits per pipeline per day, ordered descending. Surface untagged spend separately so the tagging gap is visible and chaseable.
1# A wrapper that applies the operational shell uniformly across pipeline runs
2import uuid, time, structlog
3from contextlib import contextmanager
4
5log = structlog.get_logger()
6
7@contextmanager
8def pipeline_run(pipeline_id, team, dbt_target='prod'):
9 run_id = f"r_{int(time.time())}_{uuid.uuid4().hex[:6]}"
10 log.info('pipeline_start', pipeline_id=pipeline_id, run_id=run_id, team=team)
11 snowflake_tag = f'{{"pipeline":"{pipeline_id}","run_id":"{run_id}","team":"{team}"}}'
12 set_query_tag(snowflake_tag)
13 start = time.time()
14 try:
15 yield run_id
16 duration = time.time() - start
17 log.info('pipeline_end', pipeline_id=pipeline_id, run_id=run_id,
18 duration_s=duration, status='success')
19 metric_emit('pipeline.duration_s', duration,
20 tags={'pipeline': pipeline_id, 'status': 'success'})
21 except Exception as e:
22 duration = time.time() - start
23 log.error('pipeline_failed', pipeline_id=pipeline_id, run_id=run_id,
24 duration_s=duration, error=str(e))
25 metric_emit('pipeline.duration_s', duration,
26 tags={'pipeline': pipeline_id, 'status': 'failed'})
27 raise
28
29with pipeline_run('orders_daily', 'data-platform') as run_id:
30 run_dbt(target='prod', vars={'run_id': run_id})

Step 4: Alert Routing

Wire the freshness pillar to a page (the Looker dashboard has a 6am SLA), the volume pillar to Slack (an off-band volume signal is informative but not urgent), and the distribution pillar to a daily digest email (gradual drift is rarely page-worthy). Each alert links to a runbook with the standard sections. The runbook for the freshness alert names the on-call engineer, the diagnostic path, and the known recovery actions.

Step 5: CI on the Transform Layer

The dbt project gets a slim CI workflow: every PR builds modified models and their descendants against a CI schema, runs the dbt tests, and tears down the schema on completion. The PR template asks for a summary of blast radius (computed from the dbt manifest) and a note on cost impact (estimated from sample-data run time). Schema contracts are required on the model itself, so a breaking change has to be acknowledged in the PR rather than discovered in production.

The Final State

Operational PropertyBeforeAfter
Freshness signalConsumers notice when the dashboard is stalePage within 5 minutes of the SLA breach
Volume signalEmpty extracts go undetectedSlack alert when row count falls outside expected band
Schema signalSchema drift discovered downstream by parser failuresPR-time contract check; runtime schema test
Distribution signalMean shifts noticed in a quarterly retroDaily digest of column drift; selective alerts on critical columns
LineageTribal knowledge of who reads whatdbt manifest + Looker query log; blast radius queryable
CostUnattributed share of the warehouse billDaily credit-by-pipeline rollup; cost dashboard with trends
CIPRs land without testing; bugs caught in productionSlim CI on every PR; schema contracts enforced; tests run on the diff
What this sequence buys, in order:
  • Pillar coverage on the output stops silent failures within the first week
  • Lineage answers blast-radius and root-cause questions in seconds, not minutes
  • Cost tags expose the highest-spend pipelines and make optimization actionable
  • Alert routing keeps the page tier reserved for genuine emergencies
  • Slim CI catches breaking changes before they ship; production stays calm
Before Instrumentation
  • Failures discovered by consumers; mean time to detection in days
  • Debugging starts with 'where does this come from'
  • Cost is one bill, no attribution
  • PRs deploy directly to production with no per-PR validation
After Instrumentation
  • Failures detected within minutes; mean time to detection under an hour
  • Debugging starts with the lineage graph and the relevant pillar
  • Cost is rolled up per pipeline; trends are visible weekly
  • PRs run slim CI; only the modified subset is built and tested
TIP
When sequencing instrumentation work, prefer the order shown here: pillars on the output, then lineage, then cost, then alerts, then CI. Each step makes the next one easier; reversing the order makes each step harder.
Do
  • Start at the output and walk backward; consumers see the output first
  • Tag every session with pipeline_id and run_id from the first day of cost work
  • Treat lineage as the substrate that makes every other signal more useful
Don't
  • Build a custom observability platform when dbt tests, slim CI, and Snowflake QUERY_TAG cover 80% of the value
  • Add distribution checks on every column; they are expensive and noisy
  • Skip runbooks; an alert without a runbook is half an alert
PUTTING IT ALL TOGETHER

> A platform team inherits 28 dbt models that feed seven downstream consumers. There are no quality tests, no cost attribution, and no lineage view beyond the dbt DAG itself. The CFO has asked for a cost-by-team breakdown by the end of the quarter. The platform lead has 90 days. How does the team sequence the work?

Add the five pillars to the output tables first, in order of consumer importance. Freshness and volume are cheap and stop the bleeding; schema contracts catch upstream drift; distribution checks go on the most-consumed columns only.
Use the existing dbt manifest for lineage. The graph is already there; surfacing it through the BI tool gives the platform team the blast-radius answers needed for safe schema changes and routes alerts to the right consumer owners.
Set Snowflake QUERY_TAG with pipeline_id, run_id, and team on every dbt run. Build the cost-by-team rollup against the last 30 days of QUERY_HISTORY. The CFO's question becomes a SQL query, not a multi-week investigation.
Adopt dbt slim CI to keep PR time under 15 minutes; the pipeline-as-product framing from earlier lessons asks for a contract on each model, and CI is where contracts get enforced.
KEY TAKEAWAYS
The five pillars are vocabulary, not a checklist: freshness and volume are cheap and universal; schema and distribution are selective; lineage is the substrate.
Lineage shortens incidents: forward lineage gives blast radius; backward lineage gives root cause; the dbt manifest is a free starting point.
Cost attribution makes the optimization conversation possible: QUERY_TAG threads pipeline_id and run_id through the warehouse; the credit-by-pipeline rollup is one query.
Pipeline CI is not application CI: data is part of the test surface; slim CI builds only the modified subset; schema contracts and quality tests run on the diff.
Sequence instrumentation outward from the output: pillars first, lineage next, cost tags, alert routing, CI. Each step makes the next one cheaper.

Five pillars of observability, lineage, cost attribution, and CI before production

Category
Pipeline Architecture
Difficulty
intermediate
Duration
32 minutes
Challenges
0 hands-on challenges

Topics covered: The Five Pillars of Observability, Lineage and Blast Radius, Cost Attribution, CI/CD for Pipelines, Instrumenting a Pipeline E2E

Lesson Sections

  1. The Five Pillars of Observability (concepts: paFivePillars, paFreshness, paSchemaMonitor, paDistributionCheck)

    The five pillars framework, popularized by Monte Carlo and Barr Moses, names the kinds of signal a mature data observability practice tracks. The pillars are freshness, volume, schema, distribution, and lineage. They are not a checklist of monitors. They are a vocabulary for naming where the eyes and the gaps are. A pipeline well covered on freshness and volume but blind on distribution will fail in a particular family of ways; a pipeline blind on lineage will fail differently. The framework let

  2. Lineage and Blast Radius (concepts: paLineage, paBlastRadius)

    Lineage is the graph of which datasets depend on which others. Read it forward and it answers 'who consumes this table.' Read it backward and it answers 'what produced this column.' Both directions matter operationally. Without lineage, an engineer changing a column has no way to know who breaks; without lineage, an engineer debugging a wrong number has no way to know which pipeline to inspect first. Lineage is the difference between a five-minute fix and a five-hour fix. Forward Lineage and Bla

  3. Cost Attribution (concepts: paCostAttribution, paQueryTags)

    Most data teams do not know what their pipelines cost until somebody asks. The bill arrives as a single number from Snowflake or BigQuery or Databricks; it does not break down by pipeline. Without attribution, the cost conversation is impossible: nobody can say which pipelines should be optimized, which ones can be retired, or which ones are growing fastest. The fix is query tagging, which threads a pipeline identifier through every query the warehouse runs. The pattern is universal across cloud

  4. CI/CD for Pipelines (concepts: paCiCd, paSlimCi)

    Application engineers ship changes through CI/CD: a pull request runs unit tests, integration tests, and a deploy step. Pipeline changes are different in two ways. First, the data is part of the test surface; a transform is correct only if it produces the right output on real-shaped input. Second, the pipeline operates on data the test environment may not have. Both differences shape what CI for pipelines looks like, and both are misunderstood by teams that try to import application CI patterns

  5. Instrumenting a Pipeline E2E (concepts: paInstrumentation, paOperationalSequencing)

    Take a real pipeline and walk it through the operational instrumentation. The pipeline is fct_orders, a daily aggregation that pulls from a Postgres replica, transforms in Snowflake via dbt, and feeds a Looker dashboard, a daily revenue email, and an ML feature store. The pipeline runs cleanly today. It has no observability beyond the orchestrator's run status and no cost visibility. The exercise is to bring it to a level-three operational state in a sequence that delivers value at each step. St