# The Character Encoder

> Squeeze a string down to its tightest form.

Canonical URL: <https://datadriven.io/problems/the_character_encoder>

Domain: Python · Difficulty: easy · Seniority: L3

## Problem

Given a string, return its run-length encoding: each run of identical characters becomes that character followed by its run count (always include the count, even for runs of length 1).

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether a candidate can traverse a string while tracking state across consecutive identical characters. It tests **loop control**, **string building**, and the ability to handle boundary conditions at the end of the input without off-by-one errors.

---

### Break down the requirements

#### Step 1: Initialize tracking variables

Start with the first character and a count of 1. These track the current run as you scan left to right.

#### Step 2: Walk through the string comparing neighbors

For each subsequent character, either increment the run count or flush the current run to the result and start a new one.

#### Step 3: Flush the final run after the loop

The last group never encounters a different character to trigger a flush, so you must append it explicitly after the loop ends.

---

### The solution

**Single-pass run-length encoding**

```python
def encode(s: str) -> str:
    if not s:
        return ""
    result = ""
    count = 1
    for i in range(1, len(s)):
        if s[i] == s[i - 1]:
            count += 1
        else:
            result += s[i - 1] + str(count)
            count = 1
    result += s[-1] + str(count)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the length of the string. Each character is visited exactly once.
> 
> **Space:** O(n) for the result string in the worst case (no consecutive duplicates).

> **Interviewers Watch For**
>
> Whether you handle the final run correctly. Many candidates forget to flush the last group after the loop exits, producing truncated output.

> **Common Pitfall**
>
> Returning an empty string for single-character input. A string like `'a'` should encode to `'a1'`, not `''`.

---

## Common follow-up questions

- What if the encoded string is longer than the original? _(Tests awareness that RLE can expand data when runs are short; a real compressor would fall back to the original.)_
- How would you decode this encoding back to the original string? _(Tests the inverse operation: parsing digits then repeating characters.)_
- What changes if the input can contain digits? _(Tests delimiter design: digits in the payload make the format ambiguous without an escape or separator scheme.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_character_encoder)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.