# A growth-stage observability company ingests 50 billion log events per day from a Kafka topic

Canonical URL: <https://datadriven.io/problems/a-growth-stage-observability-company-ingests-50-billion-log-b1e2f308>

Domain: Pipeline Design · Difficulty: medium

## Problem

A growth-stage observability company ingests 50 billion log events per day from a Kafka topic. The canvas has the source and the three consumers, each with a distinct freshness tier: incident pager (tier 1, sub-30-second), live ops dashboard (tier 2, rolling 5-minute aggregates), billing capacity report (tier 4, daily). Apply the entire intermediate tier: (i-s0) name which axis constrains each branch (latency for paging, throughput for billing); (i-s1) use micro-batch on the dashboard path with a 1-minute trigger; (i-s2) keep streaming only where the latency has dollar value (the pager); (i-s3) stateful transforms on the streaming branches require a state store; (i-s4) split paths by tier rather than imposing one rhythm on three different consumers. Carve the single Kafka stream into three branches with the right engine per branch: Flink with a state store (RocksDB or S3 checkpoint) for the paging branch tagged real-time or < 1min; Spark Structured Streaming for the dashboard branch tagged < 15min; plain Spark or PySpark or dbt nightly batch for the billing branch tagged < 24h. Add a shared bronze raw layer in object storage (S3, GCS, or ADLS) so Kafka has one outgoing edge and all three branches read from bronze, not from Kafka directly.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/a-growth-stage-observability-company-ingests-50-billion-log-b1e2f308)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.