A clickstream pipeline matches the section's worked example: 18 months of mobile events stored as un
A medium Pipeline Design mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.
- Domain
- Pipeline Design
- Difficulty
- medium
Interview Prompt
A clickstream pipeline matches the section's worked example: 18 months of mobile events stored as unpartitioned GZIP CSV in S3 (10TB total). The DAU dashboard scans the full 10TB on every refresh because none of the four intermediate-tier levers are applied. Apply all four (columnar format, partitioning, splittable compression, pushdown engine) so the dashboard's same SQL drops from 10TB scanned to roughly 100GB.
How This Interview Works
- Read the vague prompt (just like a real interview)
- Ask clarifying questions to the AI interviewer
- Write your pipeline design solution with real code execution
- Get instant feedback and a hire/no-hire decision