# The One-of-Each

> Strip the repeats, keep the originals.

Canonical URL: <https://datadriven.io/problems/the_one_of_each>

Domain: Python · Difficulty: easy · Seniority: L3

## Problem

Given a list of dicts (records) and a key name, return a new list containing the first occurrence of each distinct value at that key. Records that do not have the key are kept in order (they are not deduplicated against each other).

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **key-based deduplication with order preservation**, a core operation in ETL pipelines. Interviewers check whether candidates can use a set for O(1) lookup while maintaining insertion order in the output.

---

### Break down the requirements

#### Step 1: Track seen keys using a set

A set provides O(1) membership checking to determine if a record's key has been encountered before.

#### Step 2: Keep the first occurrence of each key

Iterate through records in order. If the key is new, add the record to the result and mark the key as seen.

#### Step 3: Skip subsequent duplicates

If the key is already in the seen set, skip the record entirely.

---

### The solution

**Set-based dedup preserving first occurrence**

```python
def deduplicate(records, key_field):
    seen = set()
    result = []
    for record in records:
        key = record[key_field]
        if key not in seen:
            seen.add(key)
            result.append(record)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the number of records. Each record is checked once.
> 
> **Space:** O(k) where k is the number of unique keys, for the seen set.

> **Interviewers Watch For**
>
> Do you use a set rather than checking the result list for duplicates? Using `key in result_list` is O(n) per check, making the overall approach O(n^2).

> **Common Pitfall**
>
> Deduplicating by the entire record instead of the specified key field. Two records with the same key but different other fields should still be deduplicated based on the key.

---

## Common follow-up questions

- What if you wanted to keep the last occurrence instead of the first? _(Tests reversing the input, deduplicating, then reversing the result.)_
- What if records could have composite keys (multiple fields)? _(Tests using a tuple of field values as the dedup key.)_
- How would you handle this for a 100GB dataset? _(Tests external sorting with merge-based deduplication.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_one_of_each)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.