DataDriven
LearnPracticeInterviewDiscussDailyJobs

A retailer's orders pipeline processes 1 billion events per day at peak volume, and an executive das

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
medium

Problem

A retailer's orders pipeline processes 1 billion events per day at peak volume, and an executive dashboard reads the result at 7am Pacific each morning. The canvas has the four roles in place but no rhythm decision: the transform is labeled plain Spark (which the canvas grader treats as batch), the warehouse mart has no slaFreshness, and the throughput-vs-latency tradeoff has not been named. Apply the latency-vs-throughput framing this section just taught and pick which dimension constrains this pipeline. The 7am dashboard read is a Tier 4 freshness ask (< 24h end-to-end), and 1 billion events per day is a high-throughput requirement that batch handles 10-50x cheaper than streaming. Pick batch and tag the warehouse mart with slaFreshness < 24h. Do not introduce a streaming engine; the latency target does not require it, and the throughput cost would jump 10-50x for no consumer-visible benefit.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons