# The Merge Champion

> Many sorted rivers flowing into one.

Canonical URL: <https://datadriven.io/problems/the_merge_champion>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given a list of sorted integer lists, merge them into a single sorted list and return it.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **k-way merge using a min-heap**, the standard approach for merging multiple sorted streams. It is the core algorithm behind external sort, log aggregation, and distributed query result merging.

> **Trick to Solving**
>
> Use a min-heap of size k to always know the smallest unprocessed element across all lists. Each extraction is O(log k), giving O(n log k) total where n is the total element count. This beats the naive approach of repeated two-way merges.

---

### Break down the requirements

#### Step 1: Seed the heap with the first element of each list

Push a tuple of `(value, list_index, element_index)` for the first element of each non-empty list.

#### Step 2: Extract the minimum and advance that list

Pop the smallest tuple, append the value to the result, and push the next element from the same list if one exists.

#### Step 3: Continue until the heap is empty

When all lists are exhausted, the result contains all elements in sorted order.

---

### The solution

**Min-heap k-way merge**

```python
import heapq
def merge_k_sorted(lists):
    heap = []
    for list_idx in range(len(lists)):
        if lists[list_idx]:
            heapq.heappush(heap, (lists[list_idx][0], list_idx, 0))
    result = []
    while heap:
        val, list_idx, elem_idx = heapq.heappop(heap)
        result.append(val)
        next_idx = elem_idx + 1
        if next_idx < len(lists[list_idx]):
            heapq.heappush(heap, (lists[list_idx][next_idx], list_idx, next_idx))
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n log k) where n is the total number of elements and k is the number of lists. Each of the n elements is pushed and popped once from a heap of size k.
> 
> **Space:** O(k) for the heap, plus O(n) for the result.

> **Interviewers Watch For**
>
> Do you include `list_idx` in the heap tuple to break ties deterministically? Without it, Python may try to compare list elements, which fails for non-comparable types.

> **Common Pitfall**
>
> Concatenating all lists and sorting. This is O(n log n) instead of O(n log k), which matters when k is much smaller than n.

---

## Common follow-up questions

- What if the lists were streaming from network sockets? _(Tests asynchronous I/O and backpressure handling with the heap pattern.)_
- How would you handle duplicate values across lists? _(Tests whether duplicates should be preserved, deduplicated, or counted.)_
- What if one list is much larger than the others? _(Tests awareness that the heap approach handles skewed sizes gracefully.)_
- Why is this better than repeated pairwise merges? _(Tests complexity analysis: pairwise is O(n * k) in the worst case.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_merge_champion)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.