# The Payload Flattener

> Turn a deeply nested API response into a flat row.

Canonical URL: <https://datadriven.io/problems/the_payload_flattener>

Domain: Python · Difficulty: medium · Seniority: L3

## Problem

Given a nested dict (keys map to primitive values OR to inner dicts), return a flat dict whose keys are the joined key paths using the given separator. Non-dict values become leaves with the full joined key path.

## Worked solution and explanation

### Why this problem exists in real interviews

Flattening nested JSON is the canonical test for recursion plus key path management. Data engineers see this every time they unpack an API payload into a flat schema, and interviewers use it to check whether you handle the empty dict, the separator parameter, and key collisions cleanly.

---

### Break down the requirements

#### Step 1: Recurse with an accumulating prefix

Pass a prefix string into the recursive helper. When you descend into a sub dict, append the new key to the prefix using the separator. The recursion bottom is the moment a value is not itself a dict.

#### Step 2: Respect the separator parameter

Default to `'_'` per the spec but never hard code it inside the recursion. Threading the separator through the helper signature makes the function reusable for `'.'` notation or any custom delimiter.

#### Step 3: Handle the empty prefix correctly

On the first call there is no prefix yet, so the joined key is just the current key, not `'_foo'`. A simple `f'{prefix}{separator}{key}' if prefix else key` handles this without leaking a leading separator into every output key.

---

### The solution

**Recursive walker with key path accumulator**

```python
def flatten_json(nested: dict, separator: str = '_') -> dict:
    flat = {}

    def _walk(obj, prefix):
        for key, value in obj.items():
            new_key = f'{prefix}{separator}{key}' if prefix else key
            if isinstance(value, dict):
                _walk(value, new_key)
            else:
                flat[new_key] = value

    _walk(nested, '')
    return flat
```

> **Cost Analysis**
>
> Time is O(n) where n is the total number of leaf entries because each key is visited once. Space is O(n) for the output plus O(d) recursion frames where d is the maximum nesting depth. String concatenation on the prefix is amortized cheap since each path is built incrementally.

> **Interviewers Watch For**
>
> Whether you avoid leading separators on the root keys, whether the separator is parameterized rather than hard coded, and whether you handle the empty input dict by returning an empty dict instead of crashing.

> **Common Pitfall**
>
> Using string concatenation that always prepends the separator (`prefix + separator + key`) yields keys like `'_user_name'` for top level entries. Guard the empty prefix or the entire output ends up with stray underscores.

---

## Common follow-up questions

- How would you also flatten lists by index, like `items_0_name`? _(Extend the recursion to handle `isinstance(value, list)` and append `f'{i}'` to the prefix. Discuss whether the index should use the same separator or a bracket notation.)_
- How would you reverse the operation (unflatten)? _(Split each key on the separator, walk into a fresh dict creating sub dicts as needed, and assign the leaf. Talk through what happens when a key path collides with an existing leaf.)_
- What if two paths collide after flattening? _(For example `{'a_b': 1, 'a': {'b': 2}}` with `_` separator both yield `a_b`. Either raise on collision or pick a deterministic winner; surface the choice rather than silently overwrite.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_payload_flattener)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.