# The League With Too Many Loyalties

> A player can belong to many teams. The schema must agree.

Canonical URL: <https://datadriven.io/problems/the_league_with_too_many_loyalties>

Domain: Data Modeling · Difficulty: hard · Seniority: L5

## Problem

Design a data model for a sports tournament platform. The platform tracks multiple leagues, each with multiple teams. Players belong to teams, but can also represent national teams in separate competitions. Each match has two teams, takes place at a stadium, and produces per-player and per-team stats. Analytics need cumulative player scores across all matches in a tournament.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether you can distinguish a many-to-many relationship with temporal validity from a simple foreign key. The signal is recognizing that player-to-team needs an effective-dated junction so historical match stats attribute to the team the player represented on that kickoff date, not the team they play for today.

> **Trick to Solving**
>
> Before drawing any tables, a strong candidate asks: "can a player change teams mid-tournament, and do we need stats attributed to the team they played for at the time?" If yes, the answer is an effective-dated junction, not a direct FK. A second tell: a match always has exactly two teams in fixed roles, which makes two FKs on `matches` cleaner than a junction.
> 
> 1. Spot the temporal M:M between players and teams
> 2. Model `player_team_memberships` with `start_date` and `end_date`
> 3. Model the two-team relationship on `matches` as two FKs, not a junction
> 4. Anchor the fact grain at one row per player per match

---

### Break down the requirements

#### Step 1: Declare the fact grain

`fact_player_match_stats` sits at one row per player per match. Cumulative player scores in a tournament become a simple sum with a join to `matches`.

#### Step 2: Model player-team as a temporal junction

`player_team_memberships` captures transfers with `start_date` and `end_date`. Historical attribution joins the fact to the membership window that covers the kickoff timestamp.

#### Step 3: Use two FKs on matches, not a junction

`home_team_id` and `away_team_id` both point to `teams`. A junction would force every "who played whom" query into an awkward self-join on the bridge.

#### Step 4: Put league and tournament on separate dimensions

A tournament can span multiple leagues. A league belongs to a tournament. Keep them as separate tables so filters like "all matches in this competition" remain expressible.

#### Step 5: Put stadium on its own dimension

Stadiums are shared across teams and change over time. A dimension keeps capacity and city attributes reusable.

---

### The solution

Below is one defensible model. The conceptual anchor is the temporal junction: player-to-team is many-to-many over time.

> **Why this design holds up**
>
> The membership table lets you attribute every historical stat to the team the player was on at the moment of the match. That single decision unlocks correct career arcs, transfer windows, and tournament leaderboards without rewriting fact rows when players move.

> **What strong candidates do**
>
> They surface the temporal M:M in the first minute. They resist the urge to create a `team_matches` junction for a fixed-arity relationship. They recognize that cumulative tournament scores need a bounded join on the membership window.

> **Red flags to avoid**
>
> A single `team_id` FK on `players` loses transfer history. A junction table for matches turns simple queries into triple joins. Attaching stats to the player without also carrying `team_id` on the fact row forces every attribution query to reconstruct team membership from the junction.

---

### The analysis pattern

**Tournament scoring leaders with transfer-correct attribution**

```sql
SELECT
    p.full_name,
    t.name AS tournament,
    SUM(s.goals) AS goals,
    SUM(s.assists) AS assists
FROM fact_player_match_stats s
JOIN players p ON p.player_id = s.player_id
JOIN matches m ON m.match_id = s.match_id
JOIN tournaments t ON t.tournament_id = m.tournament_id
JOIN player_team_memberships ptm
  ON ptm.player_id = s.player_id
 AND ptm.team_id = s.team_id
 AND m.kickoff_ts >= ptm.start_date
 AND (ptm.end_date IS NULL OR m.kickoff_ts < ptm.end_date)
WHERE t.tournament_id = 7
GROUP BY p.full_name, t.name
```

---

### Trade-offs and alternatives

**Temporal junction with two-FK matches**

Transfer history is preserved. Attribution queries require an as-of join. Schema is compact. Reads are slightly heavier when reconstructing membership.

**Current-team FK with SCD Type 2 on players**

Player dimension tracks team as a changing attribute. Works when players belong to one team at a time and no national-team overlay is needed. Breaks down when a player represents a club and a national team concurrently.

---

## Common follow-up questions

- How would you extend the model to let a player represent both a club and a national team in different competitions on overlapping dates? _(Tests the ability to allow concurrent memberships and scope them by tournament.)_
- How would you handle a mid-match substitution where a player only plays 20 minutes? _(Tests whether `fact_player_match_stats` captures minutes and substitution events.)_
- If a transfer window backdates a transfer by two weeks, which rows change and which stay immutable? _(Tests late-arriving dimension updates and idempotent membership writes.)_
- How would you partition `fact_player_match_stats` at 40 leagues and 20 seasons of history? _(Tests partition key choice (kickoff_ts vs tournament_id) and the impact on leaderboard queries.)_
- What changes if a team is dissolved and replaced mid-season? _(Tests whether teams need a lifecycle status or SCD treatment.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_league_with_too_many_loyalties)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.