# The Column Shuffle

> Rows in, columns out. Number them.

Canonical URL: <https://datadriven.io/problems/the_column_shuffle>

Domain: Python · Difficulty: medium · Seniority: L6

## Problem

Given records (list of dicts with 'id' and 'amount'), group by id preserving input order. For each group, produce a dict with 'id' plus keys 'amount_1', 'amount_2', ..., 'amount_k' for that group's amounts in order. Return the list of per-id dicts in the order each id first appears.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **data reshaping from long to wide format**, a core ETL pattern. It probes whether a candidate can group records by key, then pivot variable-length lists into numbered column names, mirroring what pandas `pivot` or SQL `PIVOT` does manually.

---

### Break down the requirements

#### Step 1: Group records by ID

Collect all amounts for each unique ID into an ordered list. Use a dict mapping ID to a list of amounts.

#### Step 2: Pivot amounts into numbered columns

For each ID, create keys `amount_1`, `amount_2`, etc. from the collected list.

#### Step 3: Build the output dicts

Each output dict has the ID plus the numbered amount columns.

---

### The solution

**Group-then-pivot reshaping**

```python
def pivot_records(records: list) -> list:
    groups = {}
    for rec in records:
        name = rec['name']
        if name not in groups:
            groups[name] = []
        groups[name].append(rec['amount'])
    result = []
    for name, amounts in groups.items():
        row = {'name': name}
        for i, amt in enumerate(amounts):
            col_name = 'amount_' + str(i + 1)
            row[col_name] = amt
        result.append(row)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the number of records. Grouping and pivoting are both single-pass.
> 
> **Space:** O(n) for the grouped data and output.

> **Interviewers Watch For**
>
> Whether you preserve insertion order for the amounts. Using a regular dict (Python 3.7+) maintains insertion order, which is important for deterministic column numbering.

> **Common Pitfall**
>
> Assuming all groups have the same number of amounts. The output dicts may have different numbers of `amount_*` keys, which is expected.

---

## Common follow-up questions

- What if amounts should be summed when IDs repeat? _(Tests aggregation before pivoting, changing the grouping logic.)_
- How would you handle this with pandas? _(Tests knowledge of `groupby` + `pivot_table` or `unstack`.)_
- What if the number of amounts per group could be very large? _(Tests wide-table concerns and whether a long format might be preferable.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_column_shuffle)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.