A clickstream pipeline matches the section's worked example: 18 months of mobile events stored as un
A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.
- Domain
- Pipeline Design
- Difficulty
- medium
Problem
A clickstream pipeline matches the section's worked example: 18 months of mobile events stored as unpartitioned GZIP CSV in S3 (10TB total). The DAU dashboard scans the full 10TB on every refresh because none of the four intermediate-tier levers are applied. Apply all four (columnar format, partitioning, splittable compression, pushdown engine) so the dashboard's same SQL drops from 10TB scanned to roughly 100GB.
Practice This Problem
Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.