A consumer goods company at 400 engineers had a Snowflake bill that grew 38% quarter over quarter for four straight quarters. The CFO asked the head of data which pipelines accounted for the increase. The head of data could not answer. There were no query tags, no per-pipeline cost attribution, no recurring cost review. The team spent three weeks tagging existing queries by pipeline and discovered that one rebuild of a fact table was scanning 14TB every hour, contributing roughly $42,000 per month to the bill. The rebuild had been written eighteen months earlier as a quick fix and had quietly outgrown its origin. Cost attribution did not save the $42,000 alone; it made the conversation about cost possible. The same is true of lineage, of CI/CD, and of the five pillars of observability. None of them is exotic. All of them are the difference between a pipeline that operates as a product and one that operates as a black box. This lesson is about the operational practices that scale a single pipeline into a fleet.
The Five Pillars of Observability
Daily Life
Interviews
Apply the five pillars of observability and audit a live pipeline against them.
The five pillars framework, popularized by Monte Carlo and Barr Moses, names the kinds of signal a mature data observability practice tracks. The pillars are freshness, volume, schema, distribution, and lineage. They are not a checklist of monitors. They are a vocabulary for naming where the eyes and the gaps are. A pipeline well covered on freshness and volume but blind on distribution will fail in a particular family of ways; a pipeline blind on lineage will fail differently. The framework lets the operations conversation be specific. The shift from three day-one monitors to a five-pillar framework is the same shift that the application observability community made when it moved from 'is the server up' to the three-pillar logs/metrics/traces vocabulary. The earlier framing is not wrong; it is undifferentiated. The later framing names what kind of failure each signal catches and lets investment be distributed across the kinds rather than piled onto whichever signal happens to be cheapest to add.
The Five Pillars at a Glance
FreshnessVolumeSchemaDistributionLineage
Freshness
Is the data current
Time since last update. Compared to an SLA. Most common pillar to monitor and most common to fail. 'fct_orders last updated 2 hours ago, SLA is 1 hour.'
Volume
Is the row count right
Total rows or rows per partition. Compared to a baseline (trailing average) or a fixed range. Catches empty extracts, runaway joins, silent filters.
Schema
Are the columns right
Column presence, types, nullability. Compared to a registered schema or a previous snapshot. Catches upstream schema drift before downstream consumers fail.
Distribution
Do the values look right
Statistics on column values: null rate, mean, p95, top-k for categorical. Compared to a baseline. Catches semantic drift the other pillars miss.
Lineage
What depends on what
The graph of which tables and pipelines feed which others. Not a monitor itself; the substrate that lets the others be acted on.
Why Five and Not Three
The day-one monitors from the beginner tier (did it run, did it succeed, was the output right) cover freshness and volume. Schema, distribution, and lineage are the three additions that turn a single-pipeline view into a fleet-level view. Schema catches structural drift. Distribution catches semantic drift. Lineage gives every other signal a context: a row-count drop on a leaf table affects nobody downstream, while the same drop on a foundational table affects forty consumers. Without lineage, every alert looks the same; with lineage, the response can be prioritized.
Pillar
Failure Mode It Catches
Tools That Implement It
Freshness
Pipeline silently stops; data is stale
dbt source freshness, Monte Carlo, Soda Core, custom SQL
Volume
Empty extract, runaway join, silent filter
dbt tests (row count), Great Expectations, Monte Carlo
Schema
Column added/dropped/retyped upstream
Schema registry, dbt contracts, Monte Carlo schema monitor
Distribution
Mean shift, null rate spike, new categorical value
Great Expectations, Monte Carlo, custom SQL with statistical tests
Lineage
Cannot predict who is affected by an upstream change
dbt lineage graph, OpenLineage, DataHub, Atlan
Each Pillar Has a Cost Profile
Freshness checks are cheap: they read one row of metadata. Volume checks are cheap: a COUNT(*) per partition. Schema checks are cheap: a metadata read against the information schema. Distribution checks are expensive: computing a null rate across 200 columns over a billion-row table costs real warehouse credits. Lineage is a one-time investment in a metadata graph and then ongoing maintenance. Investment in observability is therefore staged: freshness and volume first, schema next, distribution selectively on the columns where it matters, lineage as the substrate.
Where each pillar earns its cost:
▸Freshness: every pipeline; the cheapest and most universally useful signal
▸Volume: every pipeline; catches the second-most-common silent failure
▸Schema: every pipeline that crosses a system boundary or has external producers
▸Distribution: pipelines feeding ML features and customer-facing aggregates; selective everywhere else
▸Lineage: organization-wide; pays back when blast radius questions get asked
A Pillar Audit on a Live Pipeline
1
The query above takes seconds to run and catches four classes of failure. It is the single highest-leverage thing to add to a pipeline that has none of the pillars covered.
•Three-Pillar Coverage
Detects: missed runs, empty outputs, late deliveries
Misses: column added upstream that breaks parsers
Misses: a metric column drifted 3x without changing row count
Misses: blast radius of an upstream change
✓Five-Pillar Coverage
Detects everything the three-pillar set catches
Detects schema drift before downstream parsing failures
Detects distributional shifts without row-count change
Maps the impact of any change through the lineage graph
TIP
When auditing an existing pipeline, score it from zero to five on the pillars covered. Most production pipelines score two or three. Investment in the missing pillars is usually higher leverage than investment in shaving runtime.
The five pillars are vocabulary, not a checklist; coverage is staged by cost and value.
Distribution checks catch failures the other four miss but cost more to run.
Lineage is a substrate for the other pillars: it converts an alert into a prioritized response.
Lineage and Blast Radius
Daily Life
Interviews
Read forward and backward lineage to predict blast radius and to root-cause data incidents.
Lineage is the graph of which datasets depend on which others. Read it forward and it answers 'who consumes this table.' Read it backward and it answers 'what produced this column.' Both directions matter operationally. Without lineage, an engineer changing a column has no way to know who breaks; without lineage, an engineer debugging a wrong number has no way to know which pipeline to inspect first. Lineage is the difference between a five-minute fix and a five-hour fix.
Forward Lineage and Blast Radius
Forward lineage answers the question 'if this changes, what else changes.' A column rename in fct_orders looks small until forward lineage shows that 17 dbt models, 4 dashboards, and 2 ML feature definitions read from it. The blast radius is the size of the change in consumer-facing terms. The same rename has a different blast radius depending on where in the graph it sits. A leaf table has zero blast radius. A foundational dimension table has the entire downstream graph as its blast radius. Engineers who have lineage make changes calibrated to blast radius; engineers who do not have lineage either avoid all changes or make changes that break consumers.
Backward Lineage and Root Cause
Backward lineage answers 'what produced this.' A revenue dashboard shows the wrong number. The dashboard reads from a serving mart. The mart reads from three curated tables. Each curated table reads from raw extracts. Each raw extract reads from a source system. Without backward lineage, the debugger walks this chain by hand, reading code in each repository. With backward lineage, the chain is a graph query: 'show all sources that feed dashboard_revenue,' returning fifteen rows in a second.
Most complete coverage; cost; ongoing maintenance to keep current
Column-Level vs Table-Level
Table-level lineage says 'fct_orders depends on raw.orders.' Column-level lineage says 'fct_orders.amount_cents is derived from raw.orders.gross_amount minus raw.orders.refund_amount.' The first is enough for blast radius questions like 'who reads this table.' The second is needed for impact analysis on a column-specific change like 'this column's definition is changing from gross to net.' Most teams operate with table-level lineage and add column-level coverage only for the most-consumed tables. Building column-level lineage for everything is rarely worth the cost.
Operational uses of lineage:
▸Predicting blast radius before deploying a schema change
▸Routing alerts to the owners of the consumers most affected, not to a generic channel
▸Prioritizing observability investment: more pillars on tables with more consumers
▸Identifying orphan tables: tables nothing reads from, candidates for deprecation
An Incident Walked Through Lineage
Consider a Tuesday morning incident: the executive revenue dashboard shows a value that contradicts a finance team report. The data engineer on call opens the dashboard's lineage. The dashboard reads serving.revenue_dashboard_mart, which reads fct_orders, which depends on raw.orders and dim_customer. Backward lineage shortens the search to four candidates. A freshness pillar check on the four shows that fct_orders has not updated since 2am, while the dashboard expected an update at 6am. The cause is a failed run of fct_orders. Without lineage, the same investigation starts with 'where does this number come from' and burns thirty minutes before reaching the same point.
1
•Without Lineage
Engineers either avoid all changes or break consumers
Debugging a wrong number starts from 'whose pipeline is this'
Deprecation requires asking around to find consumers
Alerts are routed by table, not by impact
✓With Lineage
Changes are calibrated to known blast radius
Debugging starts with the lineage of the affected dashboard
Deprecation candidates are identifiable by query: tables nothing reads
Alerts can be prioritized by the count and importance of downstream consumers
TIP
Adopt dbt manifest-based lineage first; it is free, accurate within the transform layer, and answers most questions. Catalog products earn their cost only when the lineage graph spans systems dbt does not see.
Cost Attribution
Daily Life
Interviews
Tag warehouse queries with pipeline metadata and read a cost-by-pipeline rollup to identify optimization candidates.
Most data teams do not know what their pipelines cost until somebody asks. The bill arrives as a single number from Snowflake or BigQuery or Databricks; it does not break down by pipeline. Without attribution, the cost conversation is impossible: nobody can say which pipelines should be optimized, which ones can be retired, or which ones are growing fastest. The fix is query tagging, which threads a pipeline identifier through every query the warehouse runs. The pattern is universal across cloud warehouses, with small mechanical differences: a Snowflake QUERY_TAG, a BigQuery job label, a Databricks tag, a Redshift query group. The point is not the field name but the discipline of carrying a stable identifier through every query and rolling it up after the fact.
Query Tags in Snowflake, Tags in BigQuery, Labels in Databricks
Cluster tags and SQL warehouse tags; query tags via JDBC
Usage tables; budget reports
Redshift
Query labels via SET query_group
STL_QUERY system table; CloudWatch metrics
What to Tag
The minimum tag set for a pipeline is the pipeline identifier and the run identifier. The pipeline identifier groups queries by what they belong to (orders_daily, customer_360, ml_features). The run identifier separates today's run from yesterday's. A second useful field is the team, which lets cost roll up to organizational owners for chargeback or showback. Tagging by user is rarely useful: it answers 'who ran this query' for ad-hoc work but not for scheduled pipelines, where the user is always a service account.
Warehouse credits and dollars are not the same. A Snowflake credit is roughly $2 to $4 depending on the contract; a BigQuery slot-hour is priced separately from on-demand. The transformation from a credit-per-pipeline number to a dollar-per-pipeline number is a contract-rate multiplication, often pulled from the billing export. The dollar number is the one the business cares about; the credit number is the one engineers debug against. Both belong on the cost dashboard.
What Cost Attribution Reveals
1
In the example above, experimental_finops_test ran seven queries in seven days and consumed more credits than every other pipeline combined. The pattern is universal: the first time cost attribution gets enabled, one or two pipelines turn out to be dominating spend, often pipelines nobody had noticed.
What Cost Attribution Does Not Solve
Attribution is a measurement, not an optimization. A pipeline tagged with its name and shown to cost $42,000 a month is a problem statement, not a fix. The fix requires changing the query, the warehouse size, the materialization, or the cadence. Attribution turns 'spend went up' into 'this specific pipeline drove the increase,' which is enough to start the optimization conversation. Teams that stop at attribution and never act are common; the dashboard is reassuring without changing anything.
Reading a cost-by-pipeline dashboard:
▸The top three pipelines almost always account for over half of spend
▸The fastest-growing pipeline is the one to scrutinize first, not the largest
▸A pipeline whose cost-per-query is 100x the median is a single bloated query in disguise
▸Untagged spend (queries with no QUERY_TAG) is a process gap; chase it down
•Without Cost Attribution
Spend is one number; nobody can answer 'which pipeline'
Cost conversations stall on vague 'needs more investigation' replies
New pipelines deploy without a cost expectation
Cost spikes are noticed only when they hit the budget
✓With Cost Attribution
Spend rolls up by pipeline and team; trends are visible
Cost conversations identify specific pipelines for optimization
New pipelines come with a daily cost number from day one
Cost spikes alert to the team that owns the pipeline that drove them
✓Do
Set QUERY_TAG at the start of every pipeline session, not on individual queries
Include pipeline_id, run_id, and team in the tag
Surface untagged spend on the cost dashboard so the gap is visible
✗Don't
Tag with high-cardinality fields like user_id or full timestamps
Stop at the credit number; convert to dollars for business audiences
Treat the cost dashboard as the deliverable; the deliverable is the optimization that follows
TIP
Make untagged spend a reviewed metric. The first cost optimization is usually closing the tagging gap, because untagged queries are the ones that are easiest to lose.
CI/CD for Pipelines
Daily Life
Interviews
Design a pipeline CI strategy that tests at four layers and uses slim CI to stay fast enough to be habitual.
Application engineers ship changes through CI/CD: a pull request runs unit tests, integration tests, and a deploy step. Pipeline changes are different in two ways. First, the data is part of the test surface; a transform is correct only if it produces the right output on real-shaped input. Second, the pipeline operates on data the test environment may not have. Both differences shape what CI for pipelines looks like, and both are misunderstood by teams that try to import application CI patterns wholesale. The early signal that a CI strategy is mismatched is the engineers' habit of running pipelines locally before pushing, even when CI exists. They are not lazy or untrusting; they are working around a CI shape that does not catch what production is going to catch. The fix is not to demand that they use CI; the fix is to make CI accurate enough that the local-pipeline habit dies on its own.
What Pipeline CI Tests
Layer
What Is Tested
Common Tools
Code
Syntax, linting, type checks; SQL parses and refs resolve
ruff, mypy, sqlfluff, dbt parse
Logic
Transforms produce expected output on canned input
dbt unit tests (1.8+), pytest, dbt seeds + dbt build
Schema and contracts
Output schema matches the contract; dbt model contracts enforced
dbt contracts, dbt-checkpoint, custom CI scripts
Integration
Pipeline runs end-to-end on a sample dataset in a dev environment
Output passes the same quality tests production runs
dbt tests, Great Expectations suites, Soda checks
Sample Data and the Dev Warehouse
Pipeline tests need representative data. Two patterns dominate. The first is sample data: a small canned dataset committed to the repo or generated synthetically, used for unit tests on individual transforms. The second is a dev warehouse: a real warehouse environment with a copy or subset of production data, used for integration tests. Sample data is fast and free but tests against shapes the engineer wrote, not shapes production produces. Dev warehouse is slower and costs money but tests against the real shapes. Mature CI uses both: sample data for the tight inner loop, dev warehouse for the integration layer.
dbt Slim CI
dbt's slim CI pattern is the most widely adopted CI optimization for SQL pipelines. Instead of building every model in the project on every PR, slim CI builds only the models that changed and their downstream descendants, tested against a recent production manifest. The result is a CI run that finishes in minutes instead of hours, while still catching the changes that matter. The pattern uses dbt's state:modified+ selector and a manifest from the last successful production run.
CI runs against a sample or a recent slice. It cannot catch problems that only appear at full scale: query plans that work on a million rows and explode on a billion, distributional shifts that only show up across a quarter of data, race conditions in parallel writes, cost regressions that compound over time. CI is necessary and not sufficient. Production observability and a structured rollout (canary, gradual deploy, monitored bake-in) close the gap.
What a useful pipeline CI catches:
▸Syntax errors and broken refs (parse-time)
▸Logic errors against representative input (unit tests)
▸Quality test regressions on the modified models (slim CI + dbt tests)
▸Integration failures against a dev warehouse with sample data
Sample dataDev warehouseSlim CI manifest
Sample data
Canned input, fast iteration
Tens to thousands of rows committed to the repo. Tests transforms in seconds. Misses production-shaped edge cases that live in the long tail.
Dev warehouse
Real semantics, real cost
A CI schema in a real warehouse, populated from production samples. Catches integration bugs that mocks miss. Costs warehouse credits per PR; tear down after merge.
Slim CI manifest
Build only the diff
dbt --select state:modified+ against a recent prod manifest. Cuts CI time from hours to minutes by skipping unchanged models. The default modern dbt CI shape.
From CI to CD
Continuous deployment for pipelines is more conservative than for applications. An application rollback affects users for a few seconds. A pipeline rollback may have to revert a write that produced incorrect data, which is a much harder operation. The standard pattern is automated deploys to staging on merge, then a manual or automated promotion to production once observability has confirmed the deploy is healthy. Some teams do this with a feature flag at the pipeline level: the new code runs alongside the old code, both writing to separate tables, and the active table is switched only after validation.
•App CI Imported Wholesale
Builds every model on every PR (slow, expensive)
Mocks all warehouse interactions (catches none of the real bugs)
Treats data as a test fixture, not a test surface
Runs unit tests in seconds; misses every integration issue
✓Pipeline-Aware CI
Slim CI: only modified models plus descendants, against recent prod state
Dev warehouse with sample data; real warehouse semantics
Schema contracts checked in CI; data quality tests run on the diff
Faster on the inner loop, more thorough on the integration loop
Pipeline CI tests at four layers: code, logic, schema/contracts, integration.
Slim CI cuts dbt CI time from hours to minutes by building only the diff.
CI cannot catch full-scale problems; production observability completes the safety net.
TIP
Track CI duration as a first-class metric. Pipeline CI that takes longer than 15 minutes silently encourages engineers to skip it. The slim CI pattern exists to keep CI fast enough to be habitual.
Instrumenting a Pipeline E2E
Daily Life
Interviews
Sequence the operational instrumentation of an existing pipeline so each step delivers value before the next.
Take a real pipeline and walk it through the operational instrumentation. The pipeline is fct_orders, a daily aggregation that pulls from a Postgres replica, transforms in Snowflake via dbt, and feeds a Looker dashboard, a daily revenue email, and an ML feature store. The pipeline runs cleanly today. It has no observability beyond the orchestrator's run status and no cost visibility. The exercise is to bring it to a level-three operational state in a sequence that delivers value at each step.
Step 1: Pillar Coverage on the Output
Start at the output table, fct_orders, because that is the layer the consumers see. Add freshness (last update timestamp must be within four hours), volume (row count for today's partition between 80% and 120% of the trailing seven-day average), and schema (column presence and types match the registered schema). Distribution checks come next, on the most-consumed columns: amount_cents null rate below 0.1%, status taking only the expected categorical values. These checks live in dbt tests on the model and run as part of the production run, with results emitted as metrics.
Step 2: Lineage Visibility
dbt already produces a manifest with lineage on the transform layer. Push the manifest to a shared location and configure the BI tool (Looker, Tableau) to publish its query history. Together they map the graph from raw source to dashboard. The graph answers the blast-radius question and routes future alerts to the right consumer owners. Layering OpenLineage events on top of the dbt lineage adds the runtime view, but the static manifest is the right starting point and is free.
Step 3: Cost Tags
Set a Snowflake QUERY_TAG at the start of every pipeline session: pipeline_id, run_id, team. Backfill the cost rollup query against the last 30 days of QUERY_HISTORY to establish a baseline. The first cost dashboard is one query that returns credits per pipeline per day, ordered descending. Surface untagged spend separately so the tagging gap is visible and chaseable.
1
# A wrapper that applies the operational shell uniformly across pipeline runs
Wire the freshness pillar to a page (the Looker dashboard has a 6am SLA), the volume pillar to Slack (an off-band volume signal is informative but not urgent), and the distribution pillar to a daily digest email (gradual drift is rarely page-worthy). Each alert links to a runbook with the standard sections. The runbook for the freshness alert names the on-call engineer, the diagnostic path, and the known recovery actions.
Step 5: CI on the Transform Layer
The dbt project gets a slim CI workflow: every PR builds modified models and their descendants against a CI schema, runs the dbt tests, and tears down the schema on completion. The PR template asks for a summary of blast radius (computed from the dbt manifest) and a note on cost impact (estimated from sample-data run time). Schema contracts are required on the model itself, so a breaking change has to be acknowledged in the PR rather than discovered in production.
The Final State
Operational Property
Before
After
Freshness signal
Consumers notice when the dashboard is stale
Page within 5 minutes of the SLA breach
Volume signal
Empty extracts go undetected
Slack alert when row count falls outside expected band
Schema signal
Schema drift discovered downstream by parser failures
PR-time contract check; runtime schema test
Distribution signal
Mean shifts noticed in a quarterly retro
Daily digest of column drift; selective alerts on critical columns
Daily credit-by-pipeline rollup; cost dashboard with trends
CI
PRs land without testing; bugs caught in production
Slim CI on every PR; schema contracts enforced; tests run on the diff
What this sequence buys, in order:
▸Pillar coverage on the output stops silent failures within the first week
▸Lineage answers blast-radius and root-cause questions in seconds, not minutes
▸Cost tags expose the highest-spend pipelines and make optimization actionable
▸Alert routing keeps the page tier reserved for genuine emergencies
▸Slim CI catches breaking changes before they ship; production stays calm
•Before Instrumentation
Failures discovered by consumers; mean time to detection in days
Debugging starts with 'where does this come from'
Cost is one bill, no attribution
PRs deploy directly to production with no per-PR validation
✓After Instrumentation
Failures detected within minutes; mean time to detection under an hour
Debugging starts with the lineage graph and the relevant pillar
Cost is rolled up per pipeline; trends are visible weekly
PRs run slim CI; only the modified subset is built and tested
TIP
When sequencing instrumentation work, prefer the order shown here: pillars on the output, then lineage, then cost, then alerts, then CI. Each step makes the next one easier; reversing the order makes each step harder.
✓Do
Start at the output and walk backward; consumers see the output first
Tag every session with pipeline_id and run_id from the first day of cost work
Treat lineage as the substrate that makes every other signal more useful
✗Don't
Build a custom observability platform when dbt tests, slim CI, and Snowflake QUERY_TAG cover 80% of the value
Add distribution checks on every column; they are expensive and noisy
Skip runbooks; an alert without a runbook is half an alert
❯❯❯PUTTING IT ALL TOGETHER
> A platform team inherits 28 dbt models that feed seven downstream consumers. There are no quality tests, no cost attribution, and no lineage view beyond the dbt DAG itself. The CFO has asked for a cost-by-team breakdown by the end of the quarter. The platform lead has 90 days. How does the team sequence the work?
Add the five pillars to the output tables first, in order of consumer importance. Freshness and volume are cheap and stop the bleeding; schema contracts catch upstream drift; distribution checks go on the most-consumed columns only.
Use the existing dbt manifest for lineage. The graph is already there; surfacing it through the BI tool gives the platform team the blast-radius answers needed for safe schema changes and routes alerts to the right consumer owners.
Set Snowflake QUERY_TAG with pipeline_id, run_id, and team on every dbt run. Build the cost-by-team rollup against the last 30 days of QUERY_HISTORY. The CFO's question becomes a SQL query, not a multi-week investigation.
Adopt dbt slim CI to keep PR time under 15 minutes; the pipeline-as-product framing from earlier lessons asks for a contract on each model, and CI is where contracts get enforced.
KEY TAKEAWAYS
The five pillars are vocabulary, not a checklist: freshness and volume are cheap and universal; schema and distribution are selective; lineage is the substrate.
Lineage shortens incidents: forward lineage gives blast radius; backward lineage gives root cause; the dbt manifest is a free starting point.
Cost attribution makes the optimization conversation possible: QUERY_TAG threads pipeline_id and run_id through the warehouse; the credit-by-pipeline rollup is one query.
Pipeline CI is not application CI: data is part of the test surface; slim CI builds only the modified subset; schema contracts and quality tests run on the diff.
Sequence instrumentation outward from the output: pillars first, lineage next, cost tags, alert routing, CI. Each step makes the next one cheaper.
Five pillars of observability, lineage, cost attribution, and CI before production
Category
Pipeline Architecture
Difficulty
intermediate
Duration
32 minutes
Challenges
0 hands-on challenges
Topics covered: The Five Pillars of Observability, Lineage and Blast Radius, Cost Attribution, CI/CD for Pipelines, Instrumenting a Pipeline E2E
The five pillars framework, popularized by Monte Carlo and Barr Moses, names the kinds of signal a mature data observability practice tracks. The pillars are freshness, volume, schema, distribution, and lineage. They are not a checklist of monitors. They are a vocabulary for naming where the eyes and the gaps are. A pipeline well covered on freshness and volume but blind on distribution will fail in a particular family of ways; a pipeline blind on lineage will fail differently. The framework let
Lineage is the graph of which datasets depend on which others. Read it forward and it answers 'who consumes this table.' Read it backward and it answers 'what produced this column.' Both directions matter operationally. Without lineage, an engineer changing a column has no way to know who breaks; without lineage, an engineer debugging a wrong number has no way to know which pipeline to inspect first. Lineage is the difference between a five-minute fix and a five-hour fix. Forward Lineage and Bla
Most data teams do not know what their pipelines cost until somebody asks. The bill arrives as a single number from Snowflake or BigQuery or Databricks; it does not break down by pipeline. Without attribution, the cost conversation is impossible: nobody can say which pipelines should be optimized, which ones can be retired, or which ones are growing fastest. The fix is query tagging, which threads a pipeline identifier through every query the warehouse runs. The pattern is universal across cloud
Application engineers ship changes through CI/CD: a pull request runs unit tests, integration tests, and a deploy step. Pipeline changes are different in two ways. First, the data is part of the test surface; a transform is correct only if it produces the right output on real-shaped input. Second, the pipeline operates on data the test environment may not have. Both differences shape what CI for pipelines looks like, and both are misunderstood by teams that try to import application CI patterns
Take a real pipeline and walk it through the operational instrumentation. The pipeline is fct_orders, a daily aggregation that pulls from a Postgres replica, transforms in Snowflake via dbt, and feeds a Looker dashboard, a daily revenue email, and an ML feature store. The pipeline runs cleanly today. It has no observability beyond the orchestrator's run status and no cost visibility. The exercise is to bring it to a level-three operational state in a sequence that delivers value at each step. St