# The Link Shrinker

> Long addresses have aliases - you give them out, you keep the map.

Canonical URL: <https://datadriven.io/problems/the_link_shrinker>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given a URL string and a mutable dict code_map, generate a 6-character lowercase hex short code from the MD5 hash of the URL (first 6 hex chars), store code_map[code] = url, and return the code. If the code collides with an existing different URL, extend to the next 6 hex chars; keep extending until a unique code (for simplicity tests will not collide).

## Worked solution and explanation

### Why this problem exists in real interviews

This probes **hashing fundamentals** and the design of a basic key-value store. Interviewers use URL shorteners to test whether candidates understand deterministic hashing, collision awareness, and mutable state management.

> **Trick to Solving**
>
> The key insight is that MD5 produces a fixed-length hex digest, and you only need the first 6 characters. Recognize that `hashlib.md5(url.encode()).hexdigest()[:6]` gives a deterministic short code for any URL.

---

### Break down the requirements

#### Step 1: Hash the URL with MD5

Use `hashlib.md5` to produce a hex digest. This is deterministic: the same URL always produces the same hash.

#### Step 2: Truncate to 6 characters

Take the first 6 hex characters as the short code. This gives 16^6 (about 16 million) possible codes.

#### Step 3: Store the mapping

Insert the short code as key and the original URL as value into the mutable dictionary. Return the short code.

---

### The solution

**MD5-based short code generation**

```python
import hashlib
def shorten(url, store):
    digest = hashlib.md5(url.encode()).hexdigest()
    short_code = digest[:6]
    store[short_code] = url
    return short_code
```

> **Time and Space Complexity**
>
> **Time:** O(k) where k is the length of the URL string, for the MD5 computation.
> 
> **Space:** O(1) per call beyond the dictionary entry stored.

> **Interviewers Watch For**
>
> Do you mention collision risk? With only 6 hex characters, two different URLs could hash to the same code. Strong candidates discuss collision handling strategies even if the problem does not require implementing one.

> **Common Pitfall**
>
> Forgetting to encode the URL string to bytes before hashing. `hashlib.md5()` requires a bytes-like object, not a string.

---

## Common follow-up questions

- How would you handle hash collisions? _(Tests collision resolution: chaining, open addressing, or appending extra characters.)_
- What if the same URL is shortened twice, should it return the same code? _(Tests idempotency awareness and whether to check the store first.)_
- Why is MD5 acceptable here but not for password hashing? _(Tests understanding of cryptographic vs non-cryptographic hash use cases.)_
- How would you add an expiration time to each short link? _(Tests extending the data model with TTL metadata.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_link_shrinker)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.