DataDriven
LearnPracticeInterviewDiscussDailyJobs

A startup data team has six cron jobs glued together: pull from Postgres, pull from Stripe, clean or

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
medium

Problem

A startup data team has six cron jobs glued together: pull from Postgres, pull from Stripe, clean orders, clean payments, join the two, publish a fact table. Last week the Postgres pull ran two hours long and the dashboard showed yesterday's numbers. Apply the entire L4 beginner tier on this canvas: (b-s0) replace the cron chain with an orchestrator that owns dependency resolution; (b-s1) build a DAG with explicit edges (extract_orders, extract_payments, clean_orders, clean_payments, join_orders_payments, publish_fact); (b-s2) delegate the heavy compute to a worker engine (dbt, Spark, PySpark, Databricks, Snowflake, or BigQuery), not in-process to the orchestrator; (b-s3) pick one orchestrator (Airflow, Dagster, or Prefect) appropriate for this small new build; (b-s4) wire the 6-task chain under the orchestrator with a daily schedule and a retry policy. Add a warehouse storage destination (Snowflake, BigQuery, Redshift, or Databricks) for the published fact table. The dashboard reads from the warehouse via the orchestrator-managed pipeline.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons