# The Schema Diff

> Two versions of the same config - what changed between them?

Canonical URL: <https://datadriven.io/problems/the_schema_diff>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given dicts d1 and d2, return a dict with: 'added' (dict of keys only in d2 with d2 values), 'removed' (dict of keys only in d1 with d1 values), 'changed' (dict of keys in both but values differ, mapped to [d1_value, d2_value]).

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **set operations on dictionary keys** combined with value comparison. It is a direct analog of schema drift detection, a critical data engineering task.

---

### Break down the requirements

#### Step 1: Compute added keys

Keys in d2 but not in d1.

#### Step 2: Compute removed keys

Keys in d1 but not in d2.

#### Step 3: Compute changed keys

Keys in both dicts but with different values. Store both old and new values.

---

### The solution

**Set operations for key diff with value comparison**

```python
def schema_diff(d1, d2):
    keys1 = set(d1.keys())
    keys2 = set(d2.keys())
    added = sorted(keys2 - keys1)
    removed = sorted(keys1 - keys2)
    changed = {}
    for key in keys1 & keys2:
        if d1[key] != d2[key]:
            changed[key] = (d1[key], d2[key])
    result = {
        'added': added,
        'removed': removed,
        'changed': changed
    }
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n + m) where n and m are the sizes of the two dicts.
> 
> **Space:** O(n + m) for the sets and result.

> **Interviewers Watch For**
>
> Do you use set operations (`-`, `&`) instead of nested loops? This is the idiomatic Python approach and is both clearer and more efficient.

> **Common Pitfall**
>
> Not handling keys that exist in both but have the same value. These should not appear in the 'changed' category.

---

## Common follow-up questions

- What if dict values were nested dicts? _(Tests recursive diff for deep comparison.)_
- How would you apply this diff to transform d1 into d2? _(Tests implementing a patch function from the diff output.)_
- What if the dicts represented database schemas with column types? _(Tests the schema differ variant with typed column comparison.)_
- How would you track diffs over time for an audit log? _(Tests versioned diff storage.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_schema_diff)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.