# The Code Expander

> Compressed messages need a decoder to come alive.

Canonical URL: <https://datadriven.io/problems/the_code_expander>

Domain: Python · Difficulty: easy · Seniority: L3

## Problem

Given an RLE-encoded string where each run is a decimal count followed by a character, return the decoded string. Counts can be multi-digit.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **string parsing with state tracking**: reading digits and characters in sequence to reconstruct the original data. It probes whether a candidate can handle multi-digit numbers and cleanly separate the parsing of counts from characters.

---

### Break down the requirements

#### Step 1: Scan the string character by character

Accumulate digit characters into a number buffer. When you hit a non-digit, that is the character to repeat.

#### Step 2: Handle multi-digit counts

Numbers like `12a` mean 12 repetitions. You must collect all consecutive digits before treating the next character as the letter.

#### Step 3: Append the repeated character to the result

Convert the accumulated digits to an integer, repeat the character that many times, and reset the digit buffer.

---

### The solution

**State-tracking parser for run-length decoding**

```python
def decode(encoded: str) -> str:
    result = ""
    num_buffer = ""
    for ch in encoded:
        if ch.isdigit():
            num_buffer += ch
        else:
            count = int(num_buffer)
            for _ in range(count):
                result += ch
            num_buffer = ""
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n + m) where n is the encoded string length and m is the decoded output length. Each character in both is processed once.
> 
> **Space:** O(m) for the result string.

> **Interviewers Watch For**
>
> Whether you handle multi-digit numbers correctly. Candidates who assume single-digit counts will fail on inputs like `12a3b`.

> **Common Pitfall**
>
> Forgetting to reset the number buffer after processing each character. This causes digits from the previous run to bleed into the next.

---

## Common follow-up questions

- What if the count is missing (implying 1)? _(Tests adding a default: if `num_buffer` is empty when a letter is found, treat count as 1.)_
- How would you encode a string back into this format? _(Tests the inverse: tracking consecutive runs and writing count+char pairs.)_
- What if the input contains invalid format like consecutive letters? _(Tests error handling and input validation.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_code_expander)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.