# A clickstream pipeline matches the section's worked example: 18 months of mobile events stored as un

Canonical URL: <https://datadriven.io/problems/a-clickstream-pipeline-matches-the-sections-worked-example-e65a2196>

Domain: Pipeline Design · Difficulty: medium

## Problem

A clickstream pipeline matches the section's worked example: 18 months of mobile events stored as unpartitioned GZIP CSV in S3 (10TB total). The DAU dashboard scans the full 10TB on every refresh because none of the four intermediate-tier levers are applied. Apply all four (columnar format, partitioning, splittable compression, pushdown engine) so the dashboard's same SQL drops from 10TB scanned to roughly 100GB.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/a-clickstream-pipeline-matches-the-sections-worked-example-e65a2196)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.