# Multiplayer Game Match History

> Millions of matches. The leaderboard refreshes in fifteen minutes.

Canonical URL: <https://datadriven.io/problems/multiplayer_game_match_history>

Domain: Data Modeling · Difficulty: medium · Seniority: L5

## Problem

We run a multiplayer online game. We need to track every match played, which players participated, their scores, and derive a ranked leaderboard. Product wants to query top players by region, average score per player, and match count over the last 30 days. Design the data model.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether a candidate introduces a **bridge table** for a many-to-many relationship and understands when to materialize a pre-aggregated leaderboard. Storing win_rate rather than its components is the subtle trap: non-additive ratios cannot be rolled up.

> **Trick to Solving**
>
> When a prompt says “derive a ranked leaderboard,” the trick is to store additive tie-breakers, not pre-computed rates. Before drawing tables, a strong candidate asks: does casual count toward the leaderboard, and what happens at the end of a season?
> 
> 1. Bridge players to matches through fact_player_match_results
> 2. Add is_ranked flag on fact_matches
> 3. Pre-aggregate leaderboard with totals, not win_rate
> 4. Declare refresh schedule for agg_leaderboard

---

### Break down the requirements

#### Step 1: Model players-to-matches as a bridge

`fact_player_match_results` has one row per player per match with score, kills, and outcome. It is the many-to-many bridge and the source of truth for per-player stats.

#### Step 2: Filter casual from ranked

`fact_matches.is_ranked` is the single switch that controls leaderboard eligibility. Without it, casual pub-stomping inflates ranks.

#### Step 3: Store additive leaderboard columns

`agg_leaderboard` keeps `total_wins`, `total_matches`, `total_score`. Win rate is computed at read time so partial aggregates roll up cleanly.

#### Step 4: Declare the refresh schedule

`agg_leaderboard` is a table, not a view, and is refreshed on a fixed cadence (every 15 minutes, for example). This is the trade-off between freshness and query cost.

---

### The solution

Below is one defensible model. The bridge table between players and matches and the additive pre-aggregate are the two load-bearing decisions.

> **Why this works**
>
> Additive aggregates let you combine leaderboards across regions, seasons, or time windows without re-scanning the fact table. Storing win_rate directly would lock you into one grain and break every roll-up.

> **Interviewers watch for**
>
> A strong candidate names “additive vs non-additive measures” out loud. They also declare how often agg_leaderboard is refreshed and what happens when a match result is corrected.

> **Common pitfall**
>
> Storing `win_rate` on `agg_leaderboard`. Combining two partial aggregates (for example, two regions) requires the underlying totals, so a stored rate has to be recomputed from scratch every time. Keeping additive columns makes roll-ups trivial.

---

### The analysis pattern

**Top 10 ranked players by region**

```sql
SELECT
    p.display_name,
    r.name AS region,
    l.total_wins,
    l.total_matches,
    l.total_wins * 1.0 / NULLIF(l.total_matches, 0) AS win_rate
FROM agg_leaderboard l
JOIN dim_players p ON p.player_id = l.player_id
JOIN dim_regions r ON r.region_id = l.region_id
WHERE l.total_matches >= 50
ORDER BY win_rate DESC, l.total_score DESC
LIMIT 10
```

---

### Trade-offs and alternatives

**Pre-aggregated leaderboard table**

agg_leaderboard refreshed on a schedule.

* Cheap top-N queries
* Freshness bounded by refresh cadence
* Needs a reconciliation job for late corrections

**Live query over fact tables**

Every leaderboard read scans fact_player_match_results.

* Always fresh
* Expensive on hot paths
* Risk of timeout during events

---

## Common follow-up questions

- A match is later flagged as cheating and wiped. How does agg_leaderboard stay consistent? _(Tests whether corrections trigger a targeted rebuild or a full refresh.)_
- How would you model a seasonal reset without losing lifetime stats? _(Tests whether the candidate introduces a season_id on agg_leaderboard or partitions by season.)_
- At 2M matches per day, how do you keep agg_leaderboard refreshes under 60 seconds? _(Tests incremental aggregation patterns, CDC, or streaming rollups.)_
- A player disputes their win count. How do you reconstruct it from source? _(Tests whether fact_player_match_results is append-only and independently auditable.)_
- How would the schema change if matches had variable team sizes (1v1 to 5v5)? _(Tests whether team_size on dim_game_modes plus team column on the bridge is sufficient.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/multiplayer_game_match_history)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.