# A retailer's orders pipeline processes 1 billion events per day at peak volume, and an executive das

Canonical URL: <https://datadriven.io/problems/a-retailers-orders-pipeline-processes-1-billion-events-per-7e93d65b>

Domain: Pipeline Design · Difficulty: medium

## Problem

A retailer's orders pipeline processes 1 billion events per day at peak volume, and an executive dashboard reads the result at 7am Pacific each morning. The canvas has the four roles in place but no rhythm decision: the transform is labeled plain Spark (which the canvas grader treats as batch), the warehouse mart has no slaFreshness, and the throughput-vs-latency tradeoff has not been named. Apply the latency-vs-throughput framing this section just taught and pick which dimension constrains this pipeline. The 7am dashboard read is a Tier 4 freshness ask (< 24h end-to-end), and 1 billion events per day is a high-throughput requirement that batch handles 10-50x cheaper than streaming. Pick batch and tag the warehouse mart with slaFreshness < 24h. Do not introduce a streaming engine; the latency target does not require it, and the throughput cost would jump 10-50x for no consumer-visible benefit.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/a-retailers-orders-pipeline-processes-1-billion-events-per-7e93d65b)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.