# A pull-ingestion job extracts from a Postgres orders table with 200M rows that grows by 1M per day

Canonical URL: <https://datadriven.io/problems/a-pull-ingestion-job-extracts-from-a-postgres-orders-table-w-740671a8>

Domain: Pipeline Design · Difficulty: medium

## Problem

A pull-ingestion job extracts from a Postgres orders table with 200M rows that grows by 1M per day. The full-table nightly scan takes four hours and is starting to fight 6am application traffic. The section's pattern is incremental pull with a high-water mark: WHERE updated_at >= last_watermark AND updated_at < this_run_started_at, ORDER BY updated_at, advance only after a successful write. Pick the high-water-mark column by replacing the full-load transform with one whose name states the bookmark column with inclusive lower bound, exclusive upper bound fixed at run launch, and ORDER BY, plus a bookmark-state node that persists the watermark between runs.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/a-pull-ingestion-job-extracts-from-a-postgres-orders-table-w-740671a8)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.