# The Word Counter

> How many times does each word show up in a file?

Canonical URL: <https://datadriven.io/problems/the_word_counter>

Domain: Python · Difficulty: easy · Seniority: L3

## Problem

Given a string, split on whitespace, lowercase each word, strip attached ASCII punctuation, and return a dict mapping each resulting word to its count.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **text processing fundamentals**: splitting, case normalization, punctuation stripping, and frequency counting. It is a complete mini-pipeline that mirrors real NLP preprocessing.

---

### Break down the requirements

#### Step 1: Split the text on whitespace

Use `.split()` to handle multiple spaces and newlines.

#### Step 2: Normalize each word: lowercase and strip punctuation

Convert to lowercase and remove non-alphanumeric characters from the edges of each word.

#### Step 3: Count occurrences in a dict

Skip empty strings that result from stripping and accumulate counts.

---

### The solution

**Split, normalize, strip punctuation, count**

```python
def count_words(text: str) -> dict:
    words = text.lower().split()
    freq = {}
    for raw_word in words:
        word = raw_word.strip(".,!?;:'"()-")
        if word:
            freq[word] = freq.get(word, 0) + 1
    return freq
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the length of the text.
> 
> **Space:** O(w) where w is the number of unique words.

> **Interviewers Watch For**
>
> Using `str.strip()` with a character set is simpler than regex for edge punctuation. Strong candidates know when to reach for regex and when simpler tools suffice.

> **Common Pitfall**
>
> Forgetting to handle empty strings after stripping. A word like `', -'` becomes `''` after stripping, which should not be counted.

---

## Common follow-up questions

- How would you handle contractions like "don't"? _(Tests whether the apostrophe should be stripped or kept.)_
- What if you needed to count bigrams instead of unigrams? _(Tests sliding a window of size 2 over the word list.)_
- How would you process this in a MapReduce framework? _(Tests the classic word count MapReduce pattern: map emits (word, 1), reduce sums.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_word_counter)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.