# The Encoded Signal

> The encoding is hiding multipliers. Decode it.

Canonical URL: <https://datadriven.io/problems/the_encoded_signal>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Decode a telemetry encoding for 26 letter frequencies (a..z). Rules: digits 0-9 map to positions a-j; '#' followed by a digit (NN#) means positions k-z (24#=x, 26#=z). '(n)' denotes multi-digit count n enclosed. Return a 26-element integer list of counts for each letter. (The test pattern gives '1(2)2(3)324#26#(5)': a=2, b=3, c=1, d..j=0, x=1, z=5.)

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **stateful string parsing** with multiple encoding formats in a single stream. It probes careful character-by-character processing and the ability to handle conditional branching based on lookahead patterns.

---

### Break down the requirements

#### Step 1: Identify the character mapping scheme

Single digits 1-9 map to a-i. Two-digit numbers followed by `#` map to k(10)-z(26). Parenthesized counts indicate repetitions.

#### Step 2: Parse the encoded string with index tracking

Walk through the string, checking for `#` after two digits to determine the mapping type. Check for `(` to extract repetition counts.

#### Step 3: Accumulate into a 26-element frequency list

Map each decoded character to its index (a=0, z=25) and add the repetition count.

---

### The solution

**Index-tracking parser with lookahead**

```python
def decode_frequency(encoded: str) -> list:
    freq = [0] * 26
    i = 0
    n = len(encoded)
    while i < n:
        if i + 2 < n and encoded[i + 2] == '#':
            char_idx = int(encoded[i:i + 2]) - 1
            i += 3
        else:
            char_idx = int(encoded[i]) - 1
            i += 1
        count = 1
        if i < n and encoded[i] == '(':
            j = i + 1
            while encoded[j] != ')':
                j += 1
            count = int(encoded[i + 1:j])
            i = j + 1
        freq[char_idx] += count
    return freq
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the length of the encoded string. Each character is processed at most once.
> 
> **Space:** O(1) since the output is always a fixed 26-element list.

> **Interviewers Watch For**
>
> Whether you handle the lookahead for `#` correctly. The two-digit check must come before the single-digit check to avoid consuming the first digit alone.

> **Common Pitfall**
>
> Forgetting to parse multi-digit repetition counts inside parentheses. Counts like `(12)` mean 12 repetitions, not 1 and 2.

---

## Common follow-up questions

- What if the encoding had no parentheses and always used explicit counts? _(Tests simplifying the parser to just digit-character pairs.)_
- How would you encode a frequency list back into this format? _(Tests the reverse operation with the same encoding rules.)_
- What if invalid input could appear? _(Tests defensive parsing with error detection and reporting.)_
- How would you handle this for Unicode beyond a-z? _(Tests extending the character set and mapping scheme.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_encoded_signal)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.