# The Build Order

> Some tasks must wait for others to finish first.

Canonical URL: <https://datadriven.io/problems/the_build_order>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given n modules (0..n-1) and a list of [a, b] edges meaning a depends on b (b must be built before a), return a valid topological build order. Return [] if a cycle exists.

## Worked solution and explanation

### Why this problem exists in real interviews

Topological sort of a dependency graph tests **graph algorithms** and **cycle detection**. It is directly applicable to build systems, task schedulers, and migration runners.

> **Trick to Solving**
>
> Use Kahn's algorithm: compute in-degrees, enqueue all nodes with in-degree 0, process the queue by decrementing neighbors' in-degrees. If the result contains all nodes, the graph is acyclic.

---

### Break down the requirements

#### Step 1: Build an adjacency list and in-degree counts

Parse the dependency pairs into a directed graph structure.

#### Step 2: Seed the queue with all zero-in-degree nodes

These have no dependencies and can be built first.

#### Step 3: Process the queue, decrementing in-degrees

For each processed node, reduce the in-degree of its dependents. If any reach 0, add them to the queue.

#### Step 4: Detect cycles

If the result has fewer nodes than the total, a cycle exists. Return an empty list.

---

### The solution

**Kahn's algorithm for topological ordering**

```python
from collections import deque
def build_order(num_modules, dependencies):
    adj = {}
    in_degree = {}
    for i in range(num_modules):
        adj[i] = []
        in_degree[i] = 0
    for dep in dependencies:
        prereq = dep[0]
        target = dep[1]
        adj[prereq].append(target)
        in_degree[target] += 1
    queue = deque()
    for node in range(num_modules):
        if in_degree[node] == 0:
            queue.append(node)
    order = []
    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbor in adj[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)
    if len(order) != num_modules:
        return []
    return order
```

> **Time and Space Complexity**
>
> **Time:** O(V + E) where V is the number of modules and E is the number of dependencies.
> 
> **Space:** O(V + E) for the adjacency list and in-degree map.

> **Interviewers Watch For**
>
> Whether you detect cycles and return an empty list rather than an incomplete ordering. Also, using a deque for BFS rather than a list (which has O(n) popleft).

> **Common Pitfall**
>
> Using DFS-based topological sort without proper cycle detection. DFS works but requires tracking visited vs. in-progress nodes to detect back edges.

---

## Common follow-up questions

- What if you need all valid topological orderings? _(Tests backtracking: at each step, any node with in-degree 0 can be chosen.)_
- How would you parallelize the build? _(Tests grouping nodes by topological level: all nodes at the same level can be built in parallel.)_
- What if dependencies can be added dynamically? _(Tests incremental topological sort algorithms.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_build_order)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.