# Threshold Filter

> Above the line or below it.

Canonical URL: <https://datadriven.io/problems/threshold_filter>

Domain: Python · Difficulty: medium · Seniority: L3

## Problem

Given a dict d mapping keys to numeric values, and a numeric threshold, return a new dict containing only the entries whose value is greater than or equal to the threshold. Preserve original keys.

## Worked solution and explanation

### Why this problem exists in real interviews

Dict comprehension fluency is table stakes for any data engineering role. Interviewers use this five-line problem as a warmup that quickly weeds out candidates who reach for explicit loops, `dict()` constructors, or `filter()` callbacks when a comprehension is the idiomatic answer.

---

### Break down the requirements

#### Step 1: Iterate `d.items()`, not `d.keys()`

You need both the key (to keep it in the output) and the value (to compare against the threshold). Iterating keys and re-indexing `d[k]` works but does an extra hash lookup per item.

#### Step 2: Use `>=`, not `>`

The spec says 'greater than or equal to the threshold.' Reading the comparison wrong is the single most common bug on this problem.

#### Step 3: Return a fresh dict, do not mutate the input

A dict comprehension produces a new dict and leaves the input untouched. Mutating in place with `del` while iterating raises `RuntimeError: dictionary changed size during iteration`.

---

### The solution

**Single-line dict comprehension**

```python
def dict_filter(d, threshold):
    return {k: v for k, v in d.items() if v >= threshold}
```

> **Cost Analysis**
>
> Time: O(N) for N entries in `d`, one comparison each. Space: O(K) where K is the number of entries that pass the threshold, plus the new dict's overhead.

> **Interviewers Watch For**
>
> Whether you write a comprehension instead of a `for` loop with `result[k] = v`, whether you use `>=` (matching the spec), and whether you avoid mutating the input. Strong candidates also confirm the threshold direction by reading the spec aloud before coding.

> **Common Pitfall**
>
> Using `>` instead of `>=` and silently dropping entries that exactly equal the threshold. This produces an off-by-one in counts that often only shows up on the boundary test case.

---

## Common follow-up questions

- What if the dict values can be `None`? How would you handle that without crashing on the comparison? _(Tests defensive thinking. `None >= 5` raises `TypeError` in Python 3. The fix is `v is not None and v >= threshold`.)_
- How would you change the function to also normalize keys to lowercase on output? _(Tests comprehension extension. `{k.lower(): v for k, v in d.items() if v >= threshold}` and a follow-up about collisions when two keys lowercase to the same value.)_
- If `d` has 100 million entries, would you stream the output instead of building a dict? _(Tests when to switch from dict comprehension to a generator. A generator of `(k, v)` pairs avoids materializing everything but loses dict semantics like membership tests.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/threshold_filter)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.