# Character Occurrence Map

> Character frequency as a map.

Canonical URL: <https://datadriven.io/problems/character_occurrence_map>

Domain: Python · Difficulty: easy · Seniority: L5

## Problem

Given a string, return a list of [character, count] pairs for each distinct alphabetic character. Treat case-insensitively (lowercase everything before counting). Sort the pairs alphabetically by character.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **character frequency counting**, **case normalization**, and **sorted output construction**. It is a standard warm-up that reveals whether you handle filtering (alphabetic only) and formatting (list-of-pairs) cleanly.

---

### Break down the requirements

#### Step 1: Normalize to lowercase and filter alphabetic characters

Convert each character to lowercase and skip non-alphabetic characters. This ensures consistent counting regardless of input casing.

#### Step 2: Count occurrences using a dictionary

Build a frequency map where keys are lowercase letters and values are counts.

#### Step 3: Sort alphabetically and format as pairs

Sort the dict items by key and return each as a `[character, count]` pair.

---

### The solution

**Frequency counting with alphabetic filter and sorted output**

```python
def letter_frequency(s):
    counts = {}
    for char in s:
        if char.isalpha():
            lower_char = char.lower()
            if lower_char not in counts:
                counts[lower_char] = 0
            counts[lower_char] += 1
    result = []
    for key in sorted(counts):
        result.append([key, counts[key]])
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n + k log k) where n is the string length and k is the number of unique letters (at most 26). The sort is bounded by the alphabet size.
> 
> **Space:** O(k) for the frequency dict, at most 26 entries.

> **Interviewers Watch For**
>
> Whether you filter non-alphabetic characters before counting rather than after. Filtering first keeps the dict clean and avoids a second pass to remove non-alpha keys.

> **Common Pitfall**
>
> Forgetting to lowercase before counting. `'A'` and `'a'` must map to the same key, and the output specifies lowercase characters.

---

## Common follow-up questions

- What if the output should be sorted by frequency descending instead? _(Tests whether you can change the sort key to use counts with a tiebreaker on character.)_
- How would you handle unicode characters beyond ASCII? _(Tests awareness of `str.isalpha()` behavior with non-Latin scripts.)_
- What if the input is a very large file read line by line? _(Tests streaming character counting without loading the entire string into memory.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/character_occurrence_map)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.