# The Shadow Cleaner

> Remove the repeats. No shortcuts.

Canonical URL: <https://datadriven.io/problems/the_shadow_cleaner>

Domain: Python · Difficulty: easy · Seniority: L3

## Problem

Given a list, return a new list with duplicates removed, preserving first-occurrence order. Do NOT use set(), dict.fromkeys(), or similar built-in dedup helpers.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **order-preserving deduplication** without built-in shortcuts. It probes whether a candidate understands how to track seen elements manually using a set for O(1) lookups while preserving insertion order with a list.

---

### Break down the requirements

#### Step 1: Track which elements have been seen

Use a set (built incrementally, not via `set()` constructor on the full list) to record elements as you encounter them.

#### Step 2: Iterate and collect first occurrences

For each element, if it has not been seen, add it to both the result list and the seen set.

---

### The solution

**Manual seen-set with order preservation**

```python
def dedup(items: list) -> list:
    seen = set()
    result = []
    for item in items:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the length of the list. Each `in` check and `add` on a set is O(1) average.
> 
> **Space:** O(n) for the seen set and result list in the worst case where all elements are unique.

> **Interviewers Watch For**
>
> The constraint explicitly bans `set()` as a constructor shortcut and `dict.fromkeys()`. Building the set incrementally with `.add()` satisfies the spirit of the constraint while still achieving O(1) lookups.

> **Common Pitfall**
>
> Using a list instead of a set for `seen` makes lookups O(n), degrading overall performance to O(n^2). This matters at 100K elements.

---

## Common follow-up questions

- What if elements are unhashable, like dicts? _(Tests whether you would fall back to a list-based seen check or serialize elements.)_
- How would you dedup by a specific key in a list of dicts? _(Tests extracting a hashable key and tracking it separately.)_
- What if you needed to keep the last occurrence instead of the first? _(Tests reversing the input, deduping, then reversing back.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_shadow_cleaner)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.