# Run Length Encoding

> AAABBB becomes 3A3B. Compress it.

Canonical URL: <https://datadriven.io/problems/run_length_encoding>

Domain: Python · Difficulty: easy · Seniority: L4

## Problem

Given a string, return its run-length encoding: each run of identical characters is replaced by the character followed by its run length (as a decimal integer, even if the run is 1).

## Worked solution and explanation

### Why this problem exists in real interviews

Run-length encoding tests **consecutive-element grouping** and state tracking across a string scan. It is a practical compression technique that reveals whether you handle the final run correctly.

---

### Break down the requirements

#### Step 1: Track the current character and its run count

Start with the first character and count 1.

#### Step 2: When the character changes, emit the run

Append the character and its count to the result, then reset the tracker.

#### Step 3: Emit the final run after the loop

The last group is not followed by a character change, so you must handle it after the loop ends.

---

### The solution

**Single-pass run tracking with count emission**

```python
def run_length_encode(s):
    if not s:
        return ''
    result = ''
    current = s[0]
    count = 1
    for i in range(1, len(s)):
        if s[i] == current:
            count += 1
        else:
            result += current + str(count)
            current = s[i]
            count = 1
    result += current + str(count)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the string length. Single pass.
> 
> **Space:** O(n) for the result string in the worst case (no compression when all characters differ).

> **Interviewers Watch For**
>
> Remembering to emit the last run after the loop. This is the most common bug in RLE implementations.

> **Common Pitfall**
>
> Forgetting the final `result += current + str(count)` after the loop. The last character group is never emitted inside the loop because there is no subsequent character change to trigger it.

---

## Common follow-up questions

- How would you implement the decoder? _(Tests parsing character-count pairs and expanding each character by its count.)_
- What if counts should be omitted when they are 1? _(Tests adding a conditional: only append the count if it exceeds 1.)_
- What if the input contains digits? _(Tests that the encoding becomes ambiguous unless you add a delimiter or escape sequence.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/run_length_encoding)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.