DataDriven
LearnPracticeInterviewDiscussDailyJobs

A retail company wants a daily summary of orders by region

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
medium

Problem

A retail company wants a daily summary of orders by region. The Postgres production.orders source, an Airflow orchestrator placeholder, the Snowflake mart.orders_by_region destination, and the Morning regional dashboard are already on the canvas. The three tasks that do the actual work, the dependency chain, the schedule, and the retry policy still need to be built. Apply the first-DAG framing this section just taught and add the section's three named task transforms: extract_orders (reads Postgres production.orders, writes raw.orders), clean_orders (reads raw.orders, writes stg.orders), and aggregate_orders (reads stg.orders, writes mart.orders_by_region). Encode each task's reads and writes in its node name so the chain is readable from the diagram. Connect them in order: Postgres production.orders -> extract_orders -> clean_orders -> aggregate_orders -> Snowflake mart.orders_by_region; the dashboard reads from the mart. Rename the Airflow orchestrator node so its name encodes both the section's schedule (daily 2am) and the section's uniform retry policy (3 retries, exponential backoff, 2-minute initial delay) since the canvas has no separate fields for these and the orchestrator name is the only grader-visible place to record them. Tag each of the three task transforms with tech_label Python (matching the section's PythonOperator example).

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons