# The Word Inventory

> Every word, twice over.

Canonical URL: <https://datadriven.io/problems/the_word_inventory>

Domain: Python · Difficulty: easy · Seniority: L6

## Problem

Given a list of words, return a dict with two keys. 'counts' maps each word to its frequency. 'unique' is the sorted list of words that appear exactly once.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **frequency counting** with a **secondary filter** for singletons. It combines dict accumulation with a filtering pass, a pattern common in data quality and analytics pipelines.

---

### Break down the requirements

#### Step 1: Count word frequencies

Build a dict mapping each word to its occurrence count.

#### Step 2: Extract words appearing exactly once

Filter the frequency dict to find singletons and sort them alphabetically.

#### Step 3: Return both the counts dict and the unique list

Package both results in a dict with `'counts'` and `'unique'` keys.

---

### The solution

**Frequency dict with singleton extraction**

```python
def word_inventory(words: list) -> dict:
    counts = {}
    for word in words:
        counts[word] = counts.get(word, 0) + 1
    unique = []
    for word in sorted(counts.keys()):
        if counts[word] == 1:
            unique.append(word)
    result = {"counts": counts, "unique": unique}
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n + k log k) where n is the number of words and k is the number of unique words (for sorting).
> 
> **Space:** O(k) for the counts dict and unique list.

> **Interviewers Watch For**
>
> Sorting the unique list alphabetically. The problem specifies sorted output, which candidates sometimes miss.

> **Common Pitfall**
>
> Iterating the original word list for the unique filter instead of the counts dict. This could add duplicate singletons if a word appears at multiple positions.

---

## Common follow-up questions

- What if you also needed words appearing exactly twice? _(Tests parameterizing the frequency filter.)_
- How would you make this case-insensitive? _(Tests lowercasing words before counting.)_
- How would you handle a stream of words that does not fit in memory? _(Tests approximate frequency counting with Count-Min Sketch or similar probabilistic structures.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_word_inventory)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.