# The Pandas Pivot

> Rows become columns. Columns become power.

Canonical URL: <https://datadriven.io/problems/the_pandas_pivot>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given a list of event dicts (each with 'user_id', 'event_type', 'amount'), pivot so each user becomes one dict with 'user_id' plus one column per distinct event_type summing the amounts. Return a list of per-user dicts. Missing event_types for a user are 0. Sort the list by user_id ascending.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **pivoting tabular data without a library**, a core data transformation skill. Interviewers check whether candidates can reshape row-based event data into a wide format using dictionary accumulation.

---

### Break down the requirements

#### Step 1: Group records by user ID

Use a nested dictionary where the outer key is the user ID and the inner dict holds event type sums.

#### Step 2: Accumulate amounts per user and event type

For each record, add the amount to the corresponding user/event type bucket.

#### Step 3: Build the output rows

Convert each user's accumulated data into a flat dict with user_id and event type columns, sorted by user ID.

---

### The solution

**Manual pivot with nested dict accumulation**

```python
def pivot_events(records):
    users = {}
    for record in records:
        uid = record['user_id']
        event = record['event_type']
        amount = record['amount']
        if uid not in users:
            users[uid] = {}
        if event in users[uid]:
            users[uid][event] += amount
        else:
            users[uid][event] = amount
    result = []
    for uid in sorted(users.keys()):
        row = {'user_id': uid}
        for event in users[uid]:
            row[event] = users[uid][event]
        result.append(row)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n + u log u) where n is the number of records and u is the number of unique users (for sorting).
> 
> **Space:** O(n) for the nested accumulation structure.

> **Interviewers Watch For**
>
> Do you handle missing event types? Not every user has every event type. Your output rows may have different keys, which is correct for a sparse pivot.

> **Common Pitfall**
>
> Overwriting instead of summing when the same user has multiple records with the same event type. Using `+=` instead of `=` is critical.

---

## Common follow-up questions

- What if missing event types should default to 0? _(Tests collecting all event types first and initializing each user's row with zeros.)_
- How would you do this with pandas? _(Tests `pd.pivot_table(df, index='user_id', columns='event_type', values='amount', aggfunc='sum')`.)_
- What if the pivot needed to be memory-efficient for millions of users? _(Tests streaming or chunked processing strategies.)_
- How would you unpivot (melt) the result back to long format? _(Tests the reverse transformation.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_pandas_pivot)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.