# The Schedule Cleaner

> Overlapping sessions. One clean line.

Canonical URL: <https://datadriven.io/problems/the_schedule_cleaner>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given a list of [start, end] intervals, merge any overlapping or touching intervals and return the merged list sorted by start ascending. Intervals that share an endpoint count as overlapping (e.g., [1,3] and [3,5] merge to [1,5]).

## Worked solution and explanation

### Why this problem exists in real interviews

This is the classic **merge intervals** problem. It tests whether candidates can sort intervals and merge overlapping ones in a single pass, a fundamental pattern in time-series analysis and scheduling.

> **Trick to Solving**
>
> Sort intervals by start time. Then iterate through, extending the current interval whenever the next one overlaps. When there is no overlap, close the current interval and start a new one.

---

### Break down the requirements

#### Step 1: Sort intervals by start time

This ensures that any interval that could overlap with the current one is adjacent in the sorted order.

#### Step 2: Initialize with the first interval

Set the current interval to the first one in the sorted list.

#### Step 3: Extend or close the current interval

For each subsequent interval, if it overlaps (start is within the current interval's range), extend the current end. Otherwise, add the current interval to results and start fresh.

#### Step 4: Add the last interval

After the loop, the current interval has not been added yet.

---

### The solution

**Sort then linear merge pass**

```python
def merge_intervals(intervals):
    if not intervals:
        return []
    intervals.sort()
    merged = [intervals[0][:]]
    for i in range(1, len(intervals)):
        start, end = intervals[i]
        if start <= merged[-1][1]:
            if end > merged[-1][1]:
                merged[-1][1] = end
        else:
            merged.append([start, end])
    return merged
```

> **Time and Space Complexity**
>
> **Time:** O(n log n) for sorting, plus O(n) for the merge pass.
> 
> **Space:** O(n) for the merged output.

> **Interviewers Watch For**
>
> Do you sort by start time first? Without sorting, you would need to compare every pair of intervals, giving O(n^2). Sorting reduces merging to a single pass.

> **Common Pitfall**
>
> Using `start < merged[-1][1]` instead of `<=`. Two intervals `[1,3]` and `[3,5]` do overlap (they share point 3) and should merge to `[1,5]`.

---

## Common follow-up questions

- What if you needed to insert a new interval into an already-merged list? _(Tests the insert-interval variant with a single-pass merge.)_
- What if intervals had labels and you needed to know which merged intervals came from which originals? _(Tests maintaining provenance metadata alongside the merge.)_
- How does this pattern apply to database query optimization? _(Tests awareness that range predicates benefit from interval merging.)_
- What if intervals arrived in a stream? _(Tests maintaining a sorted structure with on-the-fly merging.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_schedule_cleaner)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.