# The Change Tracker

> Before and after snapshots. The delta is in there.

Canonical URL: <https://datadriven.io/problems/the_change_tracker>

Domain: Python · Difficulty: medium · Seniority: L3

## Problem

Given old_records and new_records (each a list of dicts with an 'id' key), return a dict with three sorted lists of ids: 'added' (in new but not old), 'removed' (in old but not new), 'modified' (in both, where any non-id field differs between the matched records). Sort each id list ascending.

## Worked solution and explanation

### Why this problem exists in real interviews

Detecting added, removed, and modified records between two snapshots is **SCD (Slowly Changing Dimension) Type 2** logic in Python form. It tests set operations and record comparison.

---

### Break down the requirements

#### Step 1: Index both record lists by 'id'

Build dicts keyed by the id field for O(1) lookup.

#### Step 2: Compute added, removed, and modified sets

Added: ids in new but not old. Removed: ids in old but not new. Modified: ids in both where any non-id field differs.

#### Step 3: Return sorted id lists

Each result list should be sorted for deterministic output.

---

### The solution

**Set-based change detection with sorted output**

```python
def detect_changes(old_records, new_records):
    old_by_id = {}
    for r in old_records:
        old_by_id[r['id']] = r
    new_by_id = {}
    for r in new_records:
        new_by_id[r['id']] = r
    old_ids = set(old_by_id.keys())
    new_ids = set(new_by_id.keys())
    added = sorted(new_ids - old_ids)
    removed = sorted(old_ids - new_ids)
    modified = []
    for rid in sorted(old_ids & new_ids):
        if old_by_id[rid] != new_by_id[rid]:
            modified.append(rid)
    result = {
        'added': added,
        'removed': removed,
        'modified': modified
    }
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n + m) for indexing plus O(k log k) for sorting change sets.
> 
> **Space:** O(n + m) for the index dicts.

> **Interviewers Watch For**
>
> Using set operations for clean categorization. Also, comparing full records (not just a subset of fields) for the modified check.

> **Common Pitfall**
>
> Only comparing one field for modifications. The prompt says 'any other field differs,' so you must compare the entire record (excluding the id key, though including it does not matter since it is the same).

---

## Common follow-up questions

- What if you needed to report which fields changed for each modification? _(Tests iterating over keys and collecting field-level diffs.)_
- How would you handle records with nested dict fields? _(Tests deep comparison using a recursive equals function or json serialization.)_
- What is SCD Type 1 vs. Type 2 vs. Type 3? _(Tests data warehousing knowledge: Type 1 overwrites, Type 2 adds versioned rows, Type 3 adds columns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_change_tracker)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.