DataDriven
LearnPracticeInterviewDiscussDailyJobs

A daily orders pipeline runs its heavy 12-hour SQL aggregation inside the Airflow scheduler itself,

A medium Pipeline Design mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.

Domain
Pipeline Design
Difficulty
medium

Interview Prompt

A daily orders pipeline runs its heavy 12-hour SQL aggregation inside the Airflow scheduler itself, using a PythonOperator that executes the SQL in-process. The aggregation is starving the scheduler: other DAGs sit waiting, the orchestrator UI lags, and a single slow task degrades visibility for every other pipeline on the same Airflow instance. This section is explicit that the orchestrator owns four responsibilities (scheduling, dependency resolution, retries, visibility) and delegates the actual transform work (the section names a Snowflake warehouse, a Spark cluster, or a Python container as the worker categories). Replace the in-process Airflow PythonOperator transform with a delegated worker transform whose name states what aggregation it runs and whose tech_label is one of the section's worker categories: Snowflake, BigQuery, Spark, PySpark, Databricks, or Python. Wire the Postgres source into the new worker transform and the new worker transform into the Snowflake daily_orders mart; the Morning dashboard reads from the mart. Keep the Airflow orchestrator node on the canvas; it continues to own when the work runs, the order it runs in, the per-task retry policy, and the on-call UI.

How This Interview Works

  1. Read the vague prompt (just like a real interview)
  2. Ask clarifying questions to the AI interviewer
  3. Write your pipeline design solution with real code execution
  4. Get instant feedback and a hire/no-hire decision

Related

  • All Mock Interviews
  • Practice Mode (untimed)
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Practice Problems
  • Daily Challenge