# The Hash Stamper

> One input, one irreversible output - the foundation of every secret.

Canonical URL: <https://datadriven.io/problems/the_hash_stamper>

Domain: Python · Difficulty: easy · Seniority: L3

## Problem

Given a seed string and a positive integer length, compute the SHA-256 hex digest of the seed and return its first `length` characters.

## Worked solution and explanation

### Why this problem exists in real interviews

Deterministic token generation comes up in idempotency keys, cache busting, and license stamping. Interviewers use this to check whether you know the hashlib module, whether you encode strings to bytes correctly before hashing, and whether you understand that SHA-256 hex digest is exactly 64 characters (so length must be bounded).

---

### Break down the requirements

#### Step 1: Encode the seed to bytes

hashlib functions require bytes, not str. Use seed.encode('utf-8') for the canonical encoding. Skipping this raises TypeError; doing it explicitly signals that you remember Python 3 split str and bytes for a reason.

#### Step 2: Take the SHA-256 hex digest

hashlib.sha256(...).hexdigest() returns a 64-character lowercase hex string. SHA-256 is the right default: widely available, fast, and not broken (unlike MD5 or SHA-1 for security uses, though for non-security idempotency keys those would also work).

#### Step 3: Slice the first `length` characters

Python slicing does not raise on lengths beyond the string, so digest[:length] returns the full digest if length > 64. The spec says 'positive integer length' so you can skip the length <= 0 guard, but it is worth naming the 64-char ceiling out loud as a real-world constraint.

---

### The solution

**Encode, hash, hex, slice**

```python
import hashlib

def generate_password_from_seed(seed: str, length: int) -> str:
    digest = hashlib.sha256(seed.encode('utf-8')).hexdigest()
    return digest[:length]
```

> **Cost Analysis**
>
> Time is O(n) in the seed length where n is the byte count after UTF-8 encoding; SHA-256 is a streaming hash with constant per-byte cost. The hex digest is always exactly 64 characters, and the slice is O(length). Space is O(n) for the encoded bytes plus O(64) for the digest.

> **Interviewers Watch For**
>
> Whether you encode to bytes explicitly, whether you pick a sensible default (SHA-256 over MD5), and whether you note the 64-character cap on length. Strong candidates also flag that this is NOT a password-hashing scheme (no salt, no key stretching) and would propose bcrypt or argon2 if the use case were authentication.

> **Common Pitfall**
>
> Calling hashlib.sha256(seed) directly without encoding, which raises TypeError because the API rejects str. Another classic is using .digest() (raw bytes) instead of .hexdigest() (hex string), then trying to slice and getting bytes back when the spec asked for a string. Using MD5 'because it is shorter' is also a red flag in any security-adjacent context.

---

## Common follow-up questions

- What changes if length can exceed 64 characters? _(iteratively hash with a counter (HKDF-style) or concatenate digests of seed||i; mention the loss of pseudo-randomness guarantees.)_
- How would you make this safe for password storage? _(switch to bcrypt, scrypt, or argon2 with a per-user salt; explain key stretching and why a fast hash like SHA-256 is the wrong tool.)_
- How would you ensure the same seed produces the same token across machines and Python versions? _(explicit UTF-8 encoding pins the byte representation; SHA-256 is a standard, so the digest is portable across all hashlib backends.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_hash_stamper)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.