DataDriven
LearnPracticeInterviewDiscussDailyJobs

A startup data team has six cron jobs glued together: pull from Postgres, pull from Stripe, clean or

A medium Pipeline Design mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.

Domain
Pipeline Design
Difficulty
medium

Interview Prompt

A startup data team has six cron jobs glued together: pull from Postgres, pull from Stripe, clean orders, clean payments, join the two, publish a fact table. Last week the Postgres pull ran two hours long and the dashboard showed yesterday's numbers. Apply the entire L4 beginner tier on this canvas: (b-s0) replace the cron chain with an orchestrator that owns dependency resolution; (b-s1) build a DAG with explicit edges (extract_orders, extract_payments, clean_orders, clean_payments, join_orders_payments, publish_fact); (b-s2) delegate the heavy compute to a worker engine (dbt, Spark, PySpark, Databricks, Snowflake, or BigQuery), not in-process to the orchestrator; (b-s3) pick one orchestrator (Airflow, Dagster, or Prefect) appropriate for this small new build; (b-s4) wire the 6-task chain under the orchestrator with a daily schedule and a retry policy. Add a warehouse storage destination (Snowflake, BigQuery, Redshift, or Databricks) for the published fact table. The dashboard reads from the warehouse via the orchestrator-managed pipeline.

How This Interview Works

  1. Read the vague prompt (just like a real interview)
  2. Ask clarifying questions to the AI interviewer
  3. Write your pipeline design solution with real code execution
  4. Get instant feedback and a hire/no-hire decision

Related

  • All Mock Interviews
  • Practice Mode (untimed)
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Practice Problems
  • Daily Challenge