# The Word Mismatch

> Some text does not match.

Canonical URL: <https://datadriven.io/problems/the_word_mismatch>

Domain: Python · Difficulty: easy · Seniority: L3

## Problem

Given two sentences as strings, split each on whitespace into words. Return the list of words that appear in exactly one sentence but not the other (case-sensitive). Output the words from sentence_a first (in order of appearance), then sentence_b's words (in order of appearance).

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **set-based difference operations** on text. Finding words that appear in one version but not the other is a basic diff operation that probes set construction and symmetric difference.

---

### Break down the requirements

#### Step 1: Split both strings into word sets

Convert each text version into a set of words.

#### Step 2: Compute the symmetric difference

Words in version A but not B, plus words in B but not A. This is the `^` operator on sets.

#### Step 3: Return the result sorted

Sort for deterministic output.

---

### The solution

**Set symmetric difference on word splits**

```python
def find_diff_words(old_text: str, new_text: str) -> list:
    old_words = set(old_text.split())
    new_words = set(new_text.split())
    diff = old_words ^ new_words
    result = sorted(diff)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O((n + m) log (n + m)) where n and m are the word counts, dominated by sorting the result.
> 
> **Space:** O(n + m) for the two word sets.

> **Interviewers Watch For**
>
> Using `^` (symmetric difference) instead of computing `(a - b) | (b - a)` manually. The operator is more concise and equally clear.

> **Common Pitfall**
>
> The comparison is case-sensitive per the prompt. Converting to lowercase would miss intentional case changes between versions.

---

## Common follow-up questions

- What if you needed to know which words were added vs. removed? _(Tests using `new - old` for additions and `old - new` for removals separately.)_
- What if word frequency mattered (a word appearing 3 times vs. 2 times)? _(Tests using Counter objects and comparing counts.)_
- How would you generate a unified diff of the two texts? _(Tests knowledge of the `difflib` module for line-level diffs.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_word_mismatch)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.