# The Character Clans

> Words sharing the same letters belong to the same clan.

Canonical URL: <https://datadriven.io/problems/the_character_clans>

Domain: Python · Difficulty: medium · Seniority: L3

## Problem

Given a list of strings, group strings that have the same set of distinct characters (not the same multiset). Return a list of groups, where groups are ordered by the first appearance of any member, and within each group members appear in input order.

## Worked solution and explanation

### Why this problem exists in real interviews

Grouping words by their unique character set tests **frozenset as a dict key** and whether you can normalize words to their distinct character signature for grouping.

---

### Break down the requirements

#### Step 1: Compute the character set for each word

Convert each word to a frozenset of its characters. This normalizes 'abc' and 'cba' to the same key.

#### Step 2: Group words by their character set

Use the frozenset as a dict key and accumulate words sharing the same set.

#### Step 3: Return the groups as a list of lists

Extract the values from the grouping dict.

---

### The solution

**Frozenset-keyed grouping by character signature**

```python
def group_by_chars(words):
    groups = {}
    for word in words:
        key = frozenset(word)
        if key not in groups:
            groups[key] = []
        groups[key].append(word)
    result = []
    for key in groups:
        result.append(groups[key])
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n * L) where n is the number of words and L is the average word length (for creating frozensets).
> 
> **Space:** O(n * L) for storing all words in the result groups.

> **Interviewers Watch For**
>
> Using `frozenset` as a hashable dict key. Regular `set` is not hashable and cannot be used as a dict key. This demonstrates knowledge of Python's type system.

> **Common Pitfall**
>
> Using `sorted(word)` as a string key instead of `frozenset`. While this works for grouping by character set (since sorted unique characters would match), it does not correctly handle words with different character frequencies. Actually, for character sets specifically, `frozenset` is the exact right tool.

---

## Common follow-up questions

- What if the grouping should be case-insensitive? _(Tests lowercasing each word before computing the frozenset.)_
- How is this different from grouping anagrams? _(Tests that anagrams have the same character multiset (with counts), while this problem only cares about the character set (without counts).)_
- What if you needed to group by character frequency instead? _(Tests using a tuple of sorted (char, count) pairs as the key.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_character_clans)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.