A marketing team computes revenue across hundreds of millions of rows by scanning raw Parquet in S3
A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.
- Domain
- Pipeline Design
- Difficulty
- medium
Problem
A marketing team computes revenue across hundreds of millions of rows by scanning raw Parquet in S3 with Pandas; each query takes 18 minutes and there is no schema enforcement. Apply the section's data-warehouse framing and add the analytical layer between the lake and the dashboard, replacing the Pandas transform with a warehouse-native one so the columnar layout and separated compute give the speedup the section names.
Practice This Problem
Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.