# Group By

> Same key, different rows. Bring them together.

Canonical URL: <https://datadriven.io/problems/group_by>

Domain: Python · Difficulty: medium · Seniority: L3

## Problem

Given a list of strings, return a dict mapping each distinct first character to the list of strings starting with that character, preserving input order within each group.

## Worked solution and explanation

### Why this problem exists in real interviews

Grouping words by their first character is a **hash-based partitioning** problem. It tests dict accumulation and whether you correctly handle the first-occurrence case for each key.

---

### Break down the requirements

#### Step 1: Extract the grouping key

The first character of each word is the partition key.

#### Step 2: Accumulate words per key in order

Append each word to the list for its starting letter, preserving input order within each group.

---

### The solution

**First-character grouping with dict accumulation**

```python
def group_by_key(words):
    groups = {}
    for word in words:
        key = word[0]
        if key not in groups:
            groups[key] = []
        groups[key].append(word)
    return groups
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the number of words. Each word is processed once.
> 
> **Space:** O(n) since every word appears in exactly one group.

> **Interviewers Watch For**
>
> Whether you use `setdefault` or a manual check. Both are correct; the manual check is more explicit in interviews.

> **Common Pitfall**
>
> Accessing `word[0]` on an empty string raises an IndexError. If empty strings are possible, add a guard check.

---

## Common follow-up questions

- What if grouping should be case-insensitive? _(Tests normalizing the key with `.lower()` while preserving the original word.)_
- How would you group by an arbitrary key function? _(Tests accepting a callable: `group_by(items, key_fn)`.)_
- What if the output groups should be sorted by key? _(Tests wrapping the result in `dict(sorted(groups.items()))` or using an OrderedDict.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/group_by)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.