# The Dependency Resolver

> Everything depends on everything.

Canonical URL: <https://datadriven.io/problems/the_dependency_resolver>

Domain: Python · Difficulty: medium · Seniority: L5

## Problem

Given a dict mapping task names to lists of task names that must run BEFORE them (prerequisites), return a valid topological execution order. If any cycle exists, raise an exception.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **topological sorting**, the backbone of every pipeline orchestrator. It probes graph traversal, cycle detection, and understanding of partial ordering, which is essential for data engineering roles.

> **Trick to Solving**
>
> Use Kahn's algorithm: compute in-degrees for all nodes, start with zero-in-degree nodes in a queue, and remove edges as you process. If the output has fewer nodes than the graph, a cycle exists.

---

### Break down the requirements

#### Step 1: Build an adjacency list and in-degree map

Parse the dependency graph into structures suitable for BFS processing.

#### Step 2: Initialize a queue with zero-dependency nodes

These tasks can run first since nothing blocks them.

#### Step 3: Process the queue, decrementing in-degrees

For each processed node, reduce the in-degree of its dependents. Add newly unblocked nodes to the queue.

#### Step 4: Detect cycles

If the result has fewer nodes than the input, some nodes are stuck in a cycle. Raise an error.

---

### The solution

**Kahn's algorithm with cycle detection**

```python
from collections import deque
def topological_sort(graph: dict) -> list:
    in_degree = {}
    for node in graph:
        if node not in in_degree:
            in_degree[node] = 0
        for dep in graph[node]:
            if dep not in in_degree:
                in_degree[dep] = 0
            in_degree[dep] += 1
    queue = deque()
    for node in in_degree:
        if in_degree[node] == 0:
            queue.append(node)
    order = []
    while queue:
        node = queue.popleft()
        order.append(node)
        for dep in graph.get(node, []):
            in_degree[dep] -= 1
            if in_degree[dep] == 0:
                queue.append(dep)
    if len(order) != len(in_degree):
        raise ValueError("Cycle detected in dependency graph")
    return order
```

> **Time and Space Complexity**
>
> **Time:** O(V + E) where V is the number of nodes and E is the total dependency edges.
> 
> **Space:** O(V + E) for the adjacency list and in-degree map.

> **Interviewers Watch For**
>
> Whether you detect cycles. A topological sort that silently drops nodes in a cycle is a production bug waiting to happen.

> **Common Pitfall**
>
> Confusing dependency direction. If A depends on B, then B must come before A in the output. Make sure the adjacency list represents the correct direction.

---

## Common follow-up questions

- What is the DFS-based approach to topological sort? _(Tests reverse post-order DFS with visited/in-stack tracking for cycle detection.)_
- How would you return all valid orderings? _(Tests backtracking to enumerate all topological orders.)_
- How does Airflow schedule tasks in a DAG? _(Tests knowledge of scheduler polling, task instance states, and concurrency limits.)_
- What if you need to maximize parallelism? _(Tests grouping independent nodes into parallel execution levels.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_dependency_resolver)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.