# The Squeeze

> aaabbb gets old fast. Shrink it.

Canonical URL: <https://datadriven.io/problems/the_squeeze>

Domain: Python · Difficulty: easy · Seniority: L3

## Problem

Given a string, return its run-length encoded form (char followed by run count, e.g. 'aabcccaaa' -> 'a2b1c3a3'). If the encoded form is not strictly shorter than the original, return the original string unchanged.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **run-length encoding**, a simple compression algorithm that appears in data pipelines and storage systems. It probes loop control, character tracking, and the conditional return logic of comparing compressed vs. original length.

---

### Break down the requirements

#### Step 1: Track the current character and its run length

Initialize with the first character and count 1. Increment count while the next character matches.

#### Step 2: Append character and count when a run ends

When the character changes, write the character followed by its count to the result.

#### Step 3: Compare lengths and return the shorter version

If the compressed string is not shorter than the original, return the original unchanged.

---

### The solution

**Run-length encoding with length comparison**

```python
def compress(s: str) -> str:
    if not s:
        return s
    parts = []
    current = s[0]
    count = 1
    for i in range(1, len(s)):
        if s[i] == current:
            count += 1
        else:
            parts.append(current + str(count))
            current = s[i]
            count = 1
    parts.append(current + str(count))
    compressed = "".join(parts)
    if len(compressed) < len(s):
        result = compressed
    else:
        result = s
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) for a single pass through the string.
> 
> **Space:** O(n) for the parts list. In the worst case (no compression), the output is longer than the input.

> **Interviewers Watch For**
>
> Handling the final run correctly. After the loop, the last character-count pair must still be appended. Forgetting this is the most common bug.

> **Common Pitfall**
>
> Building the result with string concatenation instead of a list. Repeated `result += char + count` is O(n^2) due to string immutability.

---

## Common follow-up questions

- How would you implement the decompression function? _(Tests parsing character-count pairs back into the expanded string.)_
- What if counts could be multi-digit (e.g., a character repeated 100 times)? _(Tests that the encoding already handles this since `str(count)` produces multi-digit strings.)_
- When does run-length encoding perform poorly? _(Tests understanding that data with few repeated runs actually expands under RLE.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_squeeze)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.