Loading...

Cost-Optimized Clickstream Data Lake

A hard Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
hard
Seniority
staff

Problem

Our product generates hundreds of millions of user interaction events every day. We stream them through Kafka but right now they just pile up and we have no good way to query them for analytics. Storage costs are already a concern and the data needs to be queryable for at least two years. Design an architecture to store and query this event data efficiently.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it instantly.