# A nightly batch pipeline on the canvas reads 18 million orders per day from Postgres, joins with a p

Canonical URL: <https://datadriven.io/problems/a-nightly-batch-pipeline-on-the-canvas-reads-18-million-orde-523fd4af>

Domain: Pipeline Design · Difficulty: medium

## Problem

A nightly batch pipeline on the canvas reads 18 million orders per day from Postgres, joins with a product dimension, and writes a fact_daily_orders table. The runtime has stretched from 3 hours to 11 hours after a volume increase, and the 6am SLA slips most mornings to noon. The executive dashboard reads tier-4 daily freshness; the marketing team has built a shadow streaming pipeline because they need tier-2 (under 15-minute) freshness. Apply the diagnosis-first redesign this section just walked through. Do not migrate everything to a Flink streaming pipeline (the wrong instinct; 20x cost, 9-month engineering, unjustified). Apply the right diagnosis: volume outgrew cadence (run more often, not differently) and consumers have different freshness needs (split the paths by tier). Replace the single nightly batch with two paths: (1) an hourly micro-batch path for the executive dashboard using batch tools (plain Spark, PySpark, or dbt) tagged with slaFreshness < 1h on its warehouse table, and (2) a streaming micro-batch path for the marketing dashboard using Spark Structured Streaming or Flink with a 1-minute trigger, tagged with slaFreshness < 15min on its serving store. Both paths share the same Postgres source.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/a-nightly-batch-pipeline-on-the-canvas-reads-18-million-orde-523fd4af)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.