# The Spaces Between

> Every voided trip leaves a hole in the sequence. Find the one that matters.

Canonical URL: <https://datadriven.io/problems/kth-missing-sequence-id>

Domain: Python · Difficulty: medium · Seniority: mid

## Problem

We assign every trip a sequential id, but voided and dropped records leave gaps, so the recorded ids arrive sorted and unique with holes between them. Given those recorded `ids` and a position `k`, find the k-th id missing from the sequence, counting the gaps starting after the first recorded id. When `k` runs past every gap inside the recorded range, keep counting beyond the last recorded id.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a gap-counting problem wearing an auditing costume. Between any two consecutive recorded ids, prev and current, exactly current - prev - 1 integers went missing, and that is a number you read off with a single subtraction. The real trap is reaching for the integers themselves: stepping a cursor up from ids[0] and counting absent values one at a time until you have seen k. That version is correct, and on a sparse id space it is fatal. It is O(answer), so when k is near a billion it walks a billion steps and times out, and the set(range(ids[0], ids[-1])) variant dies even faster at O(max - min) memory. The fix is to never touch the integer range at all. Accumulate the gap counts across the n recorded ids by subtraction, and the moment the running total would pass k, the answer sits inside the current gap. That is O(n) time and O(1) space, and it is the whole skill. Binary search over the (monotonic) running gap count shaves it to O(log n), but it is a refinement on top of the same arithmetic, not what rescues you from the timeout.

---

### Break down the requirements

#### Step 1: Count each gap with one subtraction

Between consecutive recorded ids prev and current, the absent integers are prev+1 through current-1, so there are current - prev - 1 of them. No loop over those integers is needed to count them, which is the entire point: you learn how many ids vanished in a gap without ever materializing the gap, so a span billions wide costs the same as a span of two.

#### Step 2: Accumulate across the array until k lands inside a gap

Walk the recorded ids in order, carrying the still-needed count k. For each gap, if k is no larger than that gap's size, the k-th missing id is the k-th integer after prev, namely prev + k, and you are done. Otherwise subtract this gap's size from k and move on. This touches each id once, so it is O(n), and it never enumerates the range between them.

#### Step 3: Let k run past the last recorded id

If the running k survives every internal gap, the k-th missing id lies beyond ids[-1]. After the loop, prev is the last recorded id and k is whatever is left, so the answer is prev + k. This is also what keeps a giant k cheap: the contiguous case [1, 2, 3] has no internal gaps at all, every requested miss is past the end, and you return in O(n) with a single addition rather than walking a billion values.

---

### The solution

**Count gaps by subtraction in a single pass**

```python
def kth_missing(ids, k):
    prev = ids[0]
    for i in range(1, len(ids)):
        current = ids[i]
        gap = current - prev - 1   # integers missing between prev and current
        if k <= gap:
            return prev + k        # k-th miss sits inside this gap
        k -= gap                   # consumed this gap, keep looking
        prev = current
    return prev + k                 # k ran past the last id; count beyond it
```

> **Complexity**
>
> O(n) time, O(1) space: one pass over the recorded ids, constant work per id, no extra structures. The approaches that actually blow up are the ones that touch the integer range. Stepping value by value is O(answer) and times out when k is near a billion, and set(range(min, max)) is O(max - min) memory and runs out of memory on a space billions wide holding only a handful of ids. The hidden case with k near a billion exists precisely to separate the arithmetic solutions from the enumerators. Binary search over the monotonic running total trims the pass to O(log n) if you want it, but the O(n) version is already a clean, defensible answer.

> **Interviewers Watch For**
>
> The tell is whether you reach for the integers or the arithmetic. Strong candidates say out loud that each gap holds current - prev - 1 missing ids, so they can count gaps without enumerating anything, then accumulate until k lands inside one. State the complexity and handle the beyond-the-last-id case explicitly. Candidates who start stepping through values one at a time usually also botch the boundary. If you have momentum, note that the running gap count only ever climbs, so a binary search would get you to O(log n).

> **Common Pitfall**
>
> The off-by-one when you back the k-th value out of its gap. Inside the winning gap the answer is prev + k using the REMAINING k, after you have subtracted every earlier gap; anchoring on current, or forgetting to decrement k as you pass each gap, returns a neighbor of the true value. The other classic miss is the contiguous case with no internal gaps: there every requested miss is past the end, and only the final return after the loop is correct.

---

**Linear pass over the array (canonical, O(n))**

Walk the recorded ids once, subtract current - prev - 1 to size each gap, and stop when the remaining k falls inside one. O(n) time, O(1) space, easy to defend, and it never enumerates the integer range. This is the answer to ship.

**Binary search on the running gap count (O(log n))**

Because the cumulative gap count ids[i] - ids[0] - i only climbs, you can bisect for the first index whose count reaches k and back the answer out of the gap just before it. Touches log n elements instead of n. A genuine speedup, but it rests on exactly the same gap arithmetic, so reach for it only after the O(n) version is correct.

---

## Common follow-up questions

- What if the recorded ids can contain duplicates? _(Duplicates make current - prev - 1 go to zero or negative and corrupt the running count; tests whether the candidate sees that the gap arithmetic quietly assumes strictly increasing ids and guards or de-dups first.)_
- The list is too large to hold in memory and arrives as a stream of sorted chunks. How would you find the k-th missing id? _(Pushes toward per-chunk gap accounting that carries the running k across chunk boundaries; tests whether they preserve the single-pass arithmetic without random access.)_
- Now return every missing id up to the k-th, not just the k-th itself. _(Forces enumeration of the answer set and makes the linear pass strictly better than binary search; tests whether the candidate knows when bisection stops paying off.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/kth-missing-sequence-id)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.