# A clickstream pipeline at a streaming company suffered a six-hour outage

Canonical URL: <https://datadriven.io/problems/a-clickstream-pipeline-at-a-streaming-company-suffered-a-six-4bdc6b3f>

Domain: Pipeline Design · Difficulty: medium

## Problem

A clickstream pipeline at a streaming company suffered a six-hour outage. A Geo-IP service deployed a config change that increased p99 latency from 80ms to 4 seconds. The pipeline had no retry policy on the Geo-IP call, no circuit breaker, no DLQ, and no backpressure on Kafka beyond retention. Three hours of data was permanently lost when Kafka retention dropped messages that could not be enriched. Trace the postmortem by adding the four missing patterns the section names so the next latency spike is bounded.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/a-clickstream-pipeline-at-a-streaming-company-suffered-a-six-4bdc6b3f)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.