# The Middle Ground

> The middle value keeps moving.

Canonical URL: <https://datadriven.io/problems/the_middle_ground>

Domain: Python · Difficulty: hard · Seniority: L5

## Problem

Implement RunningMedian that supports add(x) and get_median(). get_median returns the current median of all values added so far (average of the two middle values for even counts). Test harness passes ['add', x] and ['get_median'] ops and expects parallel results: None for add, the median for get_median.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests the **two-heap data structure design pattern** for maintaining a running median. It checks whether candidates can design a class with mutable state, choose the right data structures, and handle edge cases like querying before adding data.

> **Trick to Solving**
>
> Partition incoming values into a max-heap (lower half) and a min-heap (upper half). The median sits at the boundary between the two heaps. Python only has min-heaps, so negate values for the max-heap simulation.

---

### Break down the requirements

#### Step 1: Maintain two heaps as instance state

A max-heap `lo` for the smaller half and a min-heap `hi` for the larger half. The max-heap stores negated values.

#### Step 2: Add values with rebalancing

Insert into `lo` first, move the top to `hi`, then rebalance if `hi` grows larger. This maintains the invariant that `lo` always has equal or one more element.

#### Step 3: Return the median

If `lo` has more elements, the median is its top. If equal size, average the two tops.

#### Step 4: Raise ValueError if empty

Querying before any data is added should raise an explicit error.

---

### The solution

**Two-heap class with negation trick**

```python
import heapq
class RunningMedian:
    def __init__(self):
        self.lo = []
        self.hi = []
    def add(self, num):
        heapq.heappush(self.lo, -num)
        heapq.heappush(self.hi, -heapq.heappop(self.lo))
        if len(self.hi) > len(self.lo):
            heapq.heappush(self.lo, -heapq.heappop(self.hi))
    def median(self):
        if not self.lo:
            raise ValueError("No data")
        if len(self.lo) > len(self.hi):
            return -self.lo[0]
        return (-self.lo[0] + self.hi[0]) / 2
```

> **Time and Space Complexity**
>
> **Time:** O(log n) per `add` call due to heap operations. O(1) per `median` call.
> 
> **Space:** O(n) for storing all values across the two heaps.

> **Interviewers Watch For**
>
> Can you explain the rebalancing invariant? After every `add`, `lo` has either the same number of elements as `hi` or exactly one more. This makes median retrieval trivial.

> **Common Pitfall**
>
> Forgetting to negate when popping from the max-heap. If you push `-num` but return `lo[0]` without negating, the sign is wrong.

---

## Common follow-up questions

- How would you support a delete operation? _(Tests lazy deletion with a discard set and delayed cleanup.)_
- What if the median needs to be computed over a sliding window of size k? _(Tests combining the heap approach with window expiration.)_
- How would you handle billions of values where memory is constrained? _(Tests approximate median algorithms like P2 or t-digest.)_
- Why not use a sorted list with bisect.insort? _(Tests trade-off analysis: insort is O(n) per insertion vs O(log n) for heaps.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_middle_ground)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.