DataDriven
LearnPracticeInterviewDiscussDailyJobs

A marketing team computes revenue across hundreds of millions of rows by scanning raw Parquet in S3

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
medium

Problem

A marketing team computes revenue across hundreds of millions of rows by scanning raw Parquet in S3 with Pandas; each query takes 18 minutes and there is no schema enforcement. Apply the section's data-warehouse framing and add the analytical layer between the lake and the dashboard, replacing the Pandas transform with a warehouse-native one so the columnar layout and separated compute give the speedup the section names.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons