# The Sneaky Twins

> They look different but they are the same inside.

Canonical URL: <https://datadriven.io/problems/the_sneaky_twins>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given strings s and p containing lowercase letters, return all starting indices in s where a substring of length len(p) is an anagram of p. Return the sorted list of indices.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests the **sliding window** technique combined with **frequency counting**. Finding all anagram start positions in a string is a classic window problem that probes whether a candidate can maintain and compare character frequency maps efficiently.

> **Trick to Solving**
>
> Instead of sorting substrings (O(n*m log m)), maintain a frequency count of the current window and compare it to the pattern frequency. Slide the window by adding the new character and removing the old one, keeping comparisons O(1) per step.

---

### Break down the requirements

#### Step 1: Build the target frequency map from pattern p

Count character frequencies in p. This is the signature you are looking for.

#### Step 2: Initialize a window of size len(p) over s

Build the frequency map for the first `len(p)` characters of s.

#### Step 3: Slide the window and compare

For each new position, add the entering character and remove the leaving character. If the window frequency matches p's frequency, record the start index.

---

### The solution

**Sliding window with frequency map comparison**

```python
def find_anagrams(s: str, p: str) -> list:
    if len(p) > len(s):
        return []
    p_freq = {}
    for ch in p:
        p_freq[ch] = p_freq.get(ch, 0) + 1
    w_freq = {}
    for i in range(len(p)):
        ch = s[i]
        w_freq[ch] = w_freq.get(ch, 0) + 1
    result = []
    if w_freq == p_freq:
        result.append(0)
    for i in range(len(p), len(s)):
        add_ch = s[i]
        w_freq[add_ch] = w_freq.get(add_ch, 0) + 1
        rem_ch = s[i - len(p)]
        w_freq[rem_ch] -= 1
        if w_freq[rem_ch] == 0:
            del w_freq[rem_ch]
        if w_freq == p_freq:
            result.append(i - len(p) + 1)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n * k) where n is the length of s and k is the number of distinct characters (bounded by 26). Dict comparison is O(k) per step.
> 
> **Space:** O(k) for the two frequency maps.

> **Interviewers Watch For**
>
> Deleting keys with zero count (`del w_freq[rem_ch]`) is essential for dict equality to work. A key with value 0 would cause a false mismatch.

> **Common Pitfall**
>
> Comparing sorted substrings is O(m log m) per window position, giving O(n * m log m) total. The sliding window approach is strictly better.

---

## Common follow-up questions

- How would you optimize the dict comparison to O(1) per step? _(Tests using a `matches` counter that tracks how many characters have the correct frequency.)_
- What if the pattern could contain characters not in the string? _(Tests that the algorithm naturally handles this since frequencies will never match.)_
- How would you extend this to find approximate anagrams allowing one mismatch? _(Tests modifying the match condition to tolerate small differences.)_
- What if the input strings were very long, say 10 million characters? _(Tests awareness that the O(26) dict comparison per step is effectively O(1) and scales linearly.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_sneaky_twins)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.