# The Repeat Offenders

> Repetition is a clue.

Canonical URL: <https://datadriven.io/problems/the_repeat_offenders>

Domain: Python · Difficulty: easy · Seniority: L6

## Problem

Given a list, return the values that appear more than once, each listed only once, in the order of their first appearance in the input.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **duplicate detection with order preservation**. It checks whether candidates can use a dictionary to track counts while maintaining the order of first appearance for the output.

---

### Break down the requirements

#### Step 1: Count occurrences of each item

Build a frequency map using a dictionary.

#### Step 2: Collect items with count greater than 1

Iterate through the original list (not the dict) to preserve first-appearance order. Add each item to the result the first time its count exceeds 1.

---

### The solution

**Frequency count then order-preserving extraction**

```python
def find_duplicates(items):
    counts = {}
    for item in items:
        if item in counts:
            counts[item] += 1
        else:
            counts[item] = 1
    seen = set()
    result = []
    for item in items:
        if counts[item] > 1 and item not in seen:
            result.append(item)
            seen.add(item)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) with two passes through the list.
> 
> **Space:** O(k) where k is the number of distinct values.

> **Interviewers Watch For**
>
> Do you preserve first-appearance order? Iterating the original list (not the dict keys) and using a seen set ensures duplicates appear in the order they were first encountered.

> **Common Pitfall**
>
> Returning each duplicate every time it appears instead of once. The seen set prevents adding the same duplicate value multiple times to the result.

---

## Common follow-up questions

- What if you needed to return items appearing exactly k times? _(Tests parameterizing the frequency threshold.)_
- How would you do this in a single pass? _(Tests adding to the result on the second occurrence using a counter.)_
- What if the list contained unhashable items? _(Tests converting to tuples or using a different tracking strategy.)_
- How would you flag duplicates in a streaming data feed? _(Tests Bloom filters for approximate duplicate detection.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_repeat_offenders)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.