Loading lesson...

Pipeline Operations: Advanced

Cost as ongoing work, environments, pipeline as code, deprecation, and a 10x cost-reduction pass

Cost as ongoing work, environments, pipeline as code, deprecation, and a 10x cost-reduction pass

Category
Pipeline Architecture
Difficulty
advanced
Duration
40 minutes
Challenges
0 hands-on challenges

Topics covered: Cost Optimization as Ongoing Work, Environment Management, Declarative vs Imperative Pipeline, Deprecation and Ownership, Worked Example: 10x Cost Cut

Lesson Sections

  1. Cost Optimization as Ongoing Work (concepts: paCostLevers, paStorageTiering, paPartitionPruning)

    Pipeline cost grows unless something pushes back. New pipelines get built. Old pipelines get more data. Materializations that were efficient on a billion rows become expensive on ten billion. Reactive cost work, kicked off when the bill becomes alarming, is always more expensive than proactive cost work, where a cost rhythm runs alongside engineering. The proactive rhythm has three parts: measurement, levers, and accountability. Each part is undramatic; together they prevent the kind of crisis t

  2. Environment Management (concepts: paEnvironmentMgmt, paPiiMasking, paEphemeralEnvs)

    Application engineers have three environments: dev, staging, prod. The convention is universal. Pipeline engineers have the same three environments and a harder problem: the data shape differs across them, and the differences shape what each environment can validate. A dev environment with no data tests nothing. A staging environment with all of production's data costs as much as production. The right answer for each environment is a deliberate choice of data shape, and the choice is the operati

  3. Declarative vs Imperative Pipeline (concepts: paPipelineAsCode, paDeclarativeOrchestration, paImperativeOrchestration)

    Pipelines used to be Python scripts that called other Python scripts. Modern pipeline tooling has moved toward two distinct philosophies: declarative, where the code describes the desired state of data assets, and imperative, where the code describes the steps to take. dbt and Dagster software-defined assets sit on the declarative side. Airflow operators sit on the imperative side. The choice is not a tool preference; it is a workload fit, and the wrong choice produces the kind of pipeline that

  4. Deprecation and Ownership (concepts: paPipelineOwnership, paDeprecation)

    Pipelines are easy to build and hard to retire. The asymmetry is the largest hidden cost in mature data organizations. A startup with twenty pipelines has every pipeline owned by someone who remembers writing it. A company at five hundred engineers has thousands of pipelines, half of them written by people who left, a quarter of them feeding consumers nobody can name. Deprecating a pipeline whose owner left and whose consumers are unknown is genuinely hard. The harder problem is preventing the s

  5. Worked Example: 10x Cost Cut (concepts: paCostReductionPass, paLeverInventory, paOperationalSynthesis)

    A production pipeline at a mid-stage subscription company costs $48,000 per month. The team's hypothesis, formed casually, is that the cost is reasonable for the volume. The cost rhythm meeting flagged the pipeline as the second-largest spender; the suspicion was that it was 10x more expensive than necessary. This worked example walks through the structured cost-reduction pass that brought it from $48k to $4.7k, without breaking SLAs and without requiring a multi-month rewrite. The pass is the s