# Eight Teams, Eight Latencies

> Millions of gamers. The architecture decision changes everything.

Canonical URL: <https://datadriven.io/problems/eight_teams_eight_latencies>

Domain: Pipeline Design · Difficulty: medium · Seniority: L5

## Problem

Our gaming platform generates hundreds of millions of player events per day across millions of concurrent sessions - matchmaking events, trophy unlocks, in-game purchases, and session telemetry. Different internal teams need this data at very different latencies and granularities. Design the event pipeline and justify where you use real-time streaming versus batch processing for each consumer.

## Worked solution and explanation

### Why this problem exists in real interviews

The **streaming vs batch justification** for each consumer of a gaming event pipeline. The question tests whether you can you to explicitly state which consumers need streaming, which are fine with batch, and the cost-latency trade-off for each. Defaulting to 'stream everything' is as wrong as 'batch everything.'

> **Trick to Solving**
>
> Of 8 consumers, only 3 need streaming (anti-cheat at sub-30s, live leaderboards at 5-min, matchmaking at 1-min). The other 5 are batch-appropriate (DAU/MAU, purchase reporting, trophy rates, recommendation training, content performance). Streaming the full event stream costs significantly more than batch, and only 3 consumers benefit.
> 
> 1. Justify streaming vs batch per consumer, not globally
> 2. Anti-cheat requires stateful stream processing (sliding windows)
> 3. Purchase events need exactly-once for financial reporting

---

### Break down the requirements

#### Step 1: Design single ingestion layer

Kafka for all events. Schema includes event_type, player_id, session_id, timestamp, game_title. Partition strategy must handle high-cardinality player_id.

#### Step 2: Build the streaming path

Anti-cheat: sliding window anomaly detection (kill rate > mean + 3 SD). Leaderboards: 5-minute refresh. Matchmaking: 1-minute pool sizing. All from the same Kafka topic.

#### Step 3: Build the batch path

DAU/MAU: T+1, partitioned by date, no full table scans. Purchase reporting: exact counts for finance. Recommendation model training: weekly batch.

#### Step 4: Handle anti-cheat detection

Stateful stream processing: detect impossible achievement unlocks (trophy in 3 seconds when minimum is 30 seconds). Session stitching for crash-reconnect scenarios.

#### Step 5: Deduplicate purchase events

Exactly-once for financial reporting. Dedup on PSN Commerce transaction_id. A duplicated in-game purchase inflates revenue.

---

### The solution

> **Interviewers Watch For**
>
> The strongest signal is explicit per-consumer justification:
> 1. **Name which consumers need streaming and why**: not 'everything should stream'
> 2. **Cost-latency trade-off stated explicitly**: streaming costs more, only 3 consumers benefit
> 3. **Anti-cheat as stateful stream processing**: sliding window pattern detection
> 4. **Exactly-once for purchase events**: financial data cannot tolerate duplicates

> **Cost-Latency Trade-off**
>
> 3B events/day from 200M active players. Peak: 8M concurrent sessions on Friday evenings. Kill events are 40% of volume. The cost of streaming the full event stream is significant; only the anti-cheat, leaderboard, and matchmaking consumers justify it.

> **Common Pitfall**
>
> Streaming everything because 'real-time is always better.' For DAU/MAU metrics, finance purchase reporting, and recommendation model training, batch is cheaper, simpler, and sufficient. Over-streaming wastes compute and money without providing additional business value.

---

## Common follow-up questions

- A major tournament launches in 48 hours with 2M concurrent players (25x normal peak for that title). How does the pipeline handle it? _(Tests capacity planning: Kafka partition scaling, streaming job auto-scaling, and isolation from non-tournament traffic.)_
- 500+ game titles each have custom event taxonomies. How do you handle the schema? _(Tests common schema with game-specific extension fields, managed by a schema-contract layer.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/eight_teams_eight_latencies)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.