DataDriven
LearnPracticeInterviewDiscussDailyJobs

Two tables sit on the same source database

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
medium

Problem

Two tables sit on the same source database. customers has 2 million rows that update slowly. orders has 500 million rows growing by 10 million per day. The current pipeline does a full load on both, which is fine for customers and unworkable for orders (the nightly extract has hit 4 hours). The section's spectrum: full load for small reference tables, incremental with a bookmark for large ones. Pick the load strategy by replacing each extract transform with one whose name states the strategy (full reload or bookmark-driven incremental), and add a shared bookmark-store node the incremental extract reads and writes only after a successful run.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons