# A startup data team has six cron jobs glued together: pull from Postgres, pull from Stripe, clean or

Canonical URL: <https://datadriven.io/problems/a-startup-data-team-has-six-cron-jobs-glued-together-pull-f-c1882ede>

Domain: Pipeline Design · Difficulty: medium

## Problem

A startup data team has six cron jobs glued together: pull from Postgres, pull from Stripe, clean orders, clean payments, join the two, publish a fact table. Last week the Postgres pull ran two hours long and the dashboard showed yesterday's numbers. Apply the entire L4 beginner tier on this canvas: (b-s0) replace the cron chain with an orchestrator that owns dependency resolution; (b-s1) build a DAG with explicit edges (extract_orders, extract_payments, clean_orders, clean_payments, join_orders_payments, publish_fact); (b-s2) delegate the heavy compute to a worker engine (dbt, Spark, PySpark, Databricks, Snowflake, or BigQuery), not in-process to the orchestrator; (b-s3) pick one orchestrator (Airflow, Dagster, or Prefect) appropriate for this small new build; (b-s4) wire the 6-task chain under the orchestrator with a daily schedule and a retry policy. Add a warehouse storage destination (Snowflake, BigQuery, Redshift, or Databricks) for the published fact table. The dashboard reads from the warehouse via the orchestrator-managed pipeline.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/a-startup-data-team-has-six-cron-jobs-glued-together-pull-f-c1882ede)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.