# Prefix Based Word Replacement

> Every word trimmed to its root.

Canonical URL: <https://datadriven.io/problems/prefix_based_word_replacement>

Domain: Python · Difficulty: medium · Seniority: L5

## Problem

Given a list of root words and a whitespace-separated sentence, for each word in the sentence, if any root is a prefix of it, replace the word with the shortest such root. Return the transformed sentence with the same whitespace separations.

## Worked solution and explanation

### Why this problem exists in real interviews

Replacing words by their root prefix tests **string matching** and whether you can optimize the lookup using a set or trie rather than brute-force scanning all roots for each word.

> **Trick to Solving**
>
> Sort roots by length ascending. For each word, check if any root (shortest first) is a prefix. The first match is the shortest root. Alternatively, use a set of roots and check progressively longer prefixes of each word.

---

### Break down the requirements

#### Step 1: Build a set of roots for O(1) membership testing

Convert the root list to a set.

#### Step 2: For each word, find the shortest matching root prefix

Check prefixes of increasing length. The first match in the root set is the shortest root.

#### Step 3: Replace the word with the root or keep it unchanged

If a root prefix is found, use it. Otherwise, keep the original word.

---

### The solution

**Prefix scanning with root set lookup**

```python
def replace_words(roots, sentence):
    root_set = set(roots)
    words = sentence.split()
    result_words = []
    for word in words:
        replacement = word
        for end in range(1, len(word) + 1):
            prefix = word[:end]
            if prefix in root_set:
                replacement = prefix
                break
        result_words.append(replacement)
    result = ' '.join(result_words)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(w * L) where w is the number of words and L is the average word length. Each word checks at most L prefixes, each with O(1) set lookup.
> 
> **Space:** O(r) for the root set where r is the number of roots.

> **Interviewers Watch For**
>
> Using a set for O(1) root lookups rather than iterating the root list for each word. Also, scanning prefixes shortest-first to naturally find the shortest match.

> **Common Pitfall**
>
> Iterating roots instead of word prefixes. If you check each root against each word, the time complexity depends on root count rather than word length, which can be worse for many long roots.

---

## Common follow-up questions

- How would a trie improve performance? _(Tests building a trie from roots so each word is matched in O(L) time with a single traversal.)_
- What if multiple roots match and you need the longest, not shortest? _(Tests removing the `break` and tracking the longest match instead.)_
- What if the sentence has punctuation attached to words? _(Tests stripping punctuation before matching and reattaching it after.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/prefix_based_word_replacement)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.