# The Nearest Value Mapper

> Close enough counts. Ties go low.

Canonical URL: <https://datadriven.io/problems/the_nearest_value_mapper>

Domain: Python · Difficulty: medium · Seniority: L5

## Problem

Given keys (list of integers) and values (list of integers), return a dict mapping each key to the value whose numeric distance is smallest. Ties break by smaller value. Return string keys in the output (JSON compatibility in the test harness).

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **binary search for approximate matching**, a common operation in data engineering for joining datasets with non-exact keys (e.g., timestamp alignment). It checks whether candidates can adapt binary search beyond exact lookups.

> **Trick to Solving**
>
> Sort the reference list, then for each query element, use binary search (`bisect`) to find the insertion point. The nearest value is either the element just before or just after the insertion point. Compare both to find the closest.

---

### Break down the requirements

#### Step 1: Sort the reference list

Binary search requires sorted input. Sort a copy of the reference list.

#### Step 2: For each query element, find the insertion point

Use `bisect_left` to find where the element would be inserted in the sorted reference.

#### Step 3: Compare neighbors at the insertion point

The nearest value is either `ref[pos-1]` or `ref[pos]`. Check both and pick the one with the smaller absolute difference.

---

### The solution

**Sorted reference with bisect lookup**

```python
import bisect
def nearest_values(queries, reference):
    sorted_ref = sorted(reference)
    result = []
    for q in queries:
        pos = bisect.bisect_left(sorted_ref, q)
        best = None
        if pos < len(sorted_ref):
            best = sorted_ref[pos]
        if pos > 0:
            candidate = sorted_ref[pos - 1]
            if best is None or abs(q - candidate) <= abs(q - best):
                best = candidate
        result.append(best)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(m log m + n log m) where m is the reference size (sorting) and n is the number of queries (each binary search is O(log m)).
> 
> **Space:** O(m) for the sorted copy of the reference list.

> **Interviewers Watch For**
>
> Do you check both neighbors at the insertion point? Checking only one side misses the nearest match when the query falls between two reference values.

> **Common Pitfall**
>
> Not handling boundary cases where `pos` is 0 (no left neighbor) or `pos == len(sorted_ref)` (no right neighbor). Both must be guarded.

---

## Common follow-up questions

- What if ties should be broken by preferring the smaller value? _(Tests adjusting the comparison to use `<=` vs `<` consistently.)_
- What if the reference list is updated dynamically? _(Tests using a balanced BST or sorted container for O(log n) insertion and lookup.)_
- How does this relate to an ASOF JOIN in databases? _(Tests knowledge of temporal joins in analytics databases like kdb+ or DuckDB.)_
- What if both lists are sorted and you need to match them pairwise? _(Tests the two-pointer approach as an O(n + m) alternative to binary search.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_nearest_value_mapper)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.