# Everything Beneath

> Every account carries the weight of all it contains.

Canonical URL: <https://datadriven.io/problems/everything-beneath-account-rollup>

Domain: Python · Difficulty: medium · Seniority: mid

## Problem

A life-sciences CRM stores customer accounts in a hierarchy: each record names its account `id`, the `parent` account it sits under (`None` for a top-level account), and the `sales` booked directly to it. For every account, report the combined sales of that account together with all the accounts below it, anywhere down the chain. Accounts with no children contribute only their own sales, and the records may describe several independent top-level accounts.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a subtree sum hiding inside a CRM account hierarchy. The accounts arrive as a flat list of parent pointers, not as a tree, so the real skill being probed is whether you can reconstruct the hierarchy from `parent` references and then total each account with everything beneath it in ONE pass. The trap is the obvious O(n^2) reading: for each account, walk all of its descendants again. On a deep org with thousands of accounts that re-walks the same branches over and over. The clean answer builds a children index once and lets each node's total reuse its children's totals.

---

### Break down the requirements

#### Step 1: Invert the parent pointers into a children index

The input gives you child to parent, but a rollup needs parent to children. Walk the list once, recording each account's own sales and appending each account id to its parent's child list. Seed an empty child list for EVERY id so leaf nodes have an entry too, which removes a KeyError special case later.

#### Step 2: Find the roots, not just the first record

A top-level account is any record whose `parent` is `None`. There can be more than one, because the data may describe several independent health systems. Start a traversal from each root; assuming a single root is the most common way this breaks on the hidden forest case.

#### Step 3: Total each subtree once and remember it

A node's total is its own sales plus the totals of its children. Compute it depth first and store it as you unwind, so each account is visited exactly once. The recursion naturally handles arbitrary depth and the leaf case (no children means the total is just the node's own sales).

---

### The solution

**Build a children index, then total each subtree depth first**

```python
def rollup_sales(accounts):
    children = {}
    own_sales = {}
    for acct in accounts:
        acct_id = acct["id"]
        own_sales[acct_id] = acct["sales"]
        children.setdefault(acct_id, [])
        parent = acct["parent"]
        if parent is not None:
            children.setdefault(parent, []).append(acct_id)

    totals = {}

    def subtree_total(node):
        total = own_sales[node]
        for child in children[node]:
            total += subtree_total(child)
        totals[node] = total
        return total

    for acct in accounts:
        if acct["parent"] is None:
            subtree_total(acct["id"])

    return totals
```

> **Complexity**
>
> Time is O(n): one pass to build the index, then each account is visited exactly once during the depth-first totals. Space is O(n) for the children index plus the recursion stack, which is as deep as the longest chain. A real CRM account graph runs to tens of thousands of nodes but only a handful of levels deep, so this stays comfortably linear; the recursion depth is the org's height, not its size.

**Naive per-account walk**

For each account, recursively descend and sum its descendants from scratch. Correct, but a node near the root is re-summed once for every ancestor, giving O(n * depth) and quadratic behavior on a tall hierarchy.

**Index once, total once**

Build the children index once, then each subtree total is computed a single time and reused by its parent. Every account is touched once: O(n).

> **Interviewers Watch For**
>
> Whether you invert the parent pointers into a children index instead of scanning the whole list to find children at each step, whether you start from ALL roots rather than assuming one, and whether you seed empty child lists so leaves need no special case. Strong candidates also note the output is order independent, so a dict keyed by id is the natural return.

> **Common Pitfall**
>
> Assuming a single top-level account and kicking off the traversal from accounts[0]. On a forest with several independent systems, every account under the other roots silently drops out of the result. The second visible case exists precisely to catch this: two roots, both must appear with their own rolled-up totals.

---

## Common follow-up questions

- What if a parent id can reference an account that is not present in the input? _(Tests defensive handling: treat dangling parents as roots, or validate and raise. Forces a decision about data quality rather than crashing on a missing key.)_
- How would you detect and reject a cycle, where an account is its own ancestor? _(Tests a visited or in-progress set during traversal, since the recursion would otherwise overflow the stack on a cyclic parent chain.)_
- The hierarchy no longer fits in memory on one machine. How would you roll up sales across a partitioned account graph? _(Tests level-by-level aggregation or a Pregel-style message passing toward roots, the distributed analog of the single-machine depth-first sum.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/everything-beneath-account-rollup)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.