# The String Shrinker

> Compress the string. Shorter wins.

Canonical URL: <https://datadriven.io/problems/the_string_shrinker>

Domain: Python · Difficulty: easy · Seniority: L4

## Problem

Given a string, return its run-length encoding: each run of identical characters becomes that character followed by the count, except runs of length 1 which omit the count. For example 'aabcccccaaa' -> 'a2bc5a3'.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **run-length encoding** with a twist: counts of 1 are omitted. It probes string traversal, conditional formatting, and the length comparison logic that decides whether compression actually saves space.

---

### Break down the requirements

#### Step 1: Track character runs

Iterate through the string, counting consecutive occurrences of each character.

#### Step 2: Format runs with optional counts

Append the character followed by the count only if the count exceeds 1. Single characters appear without a count suffix.

#### Step 3: Return the shorter of original or compressed

If compression does not reduce length, return the original.

---

### The solution

**Run-length encoding with count-1 omission**

```python
def compress(s: str) -> str:
    if not s:
        return s
    parts = []
    current = s[0]
    count = 1
    for i in range(1, len(s)):
        if s[i] == current:
            count += 1
        else:
            parts.append(current)
            if count > 1:
                parts.append(str(count))
            current = s[i]
            count = 1
    parts.append(current)
    if count > 1:
        parts.append(str(count))
    compressed = "".join(parts)
    if len(compressed) < len(s):
        result = compressed
    else:
        result = s
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) for a single pass through the string.
> 
> **Space:** O(n) for the parts list.

> **Interviewers Watch For**
>
> The omission of count-1 is the differentiator from the basic RLE problem. Candidates who always append the count show they did not read the prompt carefully.

> **Common Pitfall**
>
> Forgetting to flush the last run after the loop ends. The final character-count pair must be appended outside the loop.

---

## Common follow-up questions

- How would you decompress a string encoded this way? _(Tests parsing single characters vs. character-followed-by-digits.)_
- What if the string contained digits? _(Tests ambiguity in decoding and potential need for delimiter characters.)_
- When is RLE a poor choice compared to other compression algorithms? _(Tests awareness that RLE only helps with long runs; random data actually expands.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_string_shrinker)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.