Learn Practice Interview Discuss Daily Jobs

A marketing team computes revenue across hundreds of millions of rows by scanning raw Parquet in S3

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain: Pipeline Design
Difficulty: medium

Problem

A marketing team computes revenue across hundreds of millions of rows by scanning raw Parquet in S3 with Pandas; each query takes 18 minutes and there is no schema enforcement. Apply the section's data-warehouse framing and add the analytical layer between the lake and the dashboard, replacing the Pandas transform with a warehouse-native one so the columnar layout and separated compute give the speedup the section names.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.