# Who Came First

> The feed repeats itself. Keep each member only the first time it speaks.

Canonical URL: <https://datadriven.io/problems/first-seen-member-feed>

Domain: Python · Difficulty: easy · Seniority: mid

## Problem

An upstream feed hands you member IDs in the exact order they were scanned during a shift, and the same member often gets scanned several times. Collapse the feed so each member appears once, in the position it was first seen.

## Worked solution and explanation

### Why this problem exists in real interviews

This is order-preserving deduplication wearing a member-feed costume. Everyone knows set(items) strips duplicates; the trap is that a set also throws away order, so list(set(items)) hands back the IDs in an arbitrary hash ordering. The whole challenge is the tension between two requirements at once: keep one copy per member AND keep each at its first-seen position. Interviewers ask it because the obvious one-liner is subtly wrong, and the clean fix tells them whether you actually know your standard library.

---

### Break down the requirements

#### Step 1: Track what you have already emitted

You need O(1) membership tests as you walk the feed, so a set of seen IDs is the right companion structure. A list-based 'if x not in result' check is correct but turns the whole thing O(n^2), which an interviewer will notice on a million-row feed.

#### Step 2: Emit on first sight only

Iterate the feed in its given order. The first time an ID shows up, record it in the output and mark it seen; every later repeat is skipped. Because you append in iteration order, first-seen position falls out for free.

#### Step 3: Let an empty feed answer itself

No special case needed: an empty input produces an empty output through the same loop. Resist adding a guard clause; it is negative signal here.

---

### The solution

**dict.fromkeys preserves insertion order and dedupes in one pass**

```python
def unique_in_order(items):
    return list(dict.fromkeys(items))
```

Since Python 3.7 dict preserves insertion order, so dict.fromkeys(items) builds a key per distinct value in first-seen order, collapsing repeats automatically. Wrapping it in list() returns exactly the ordered, deduplicated sequence. It is the idiom a senior reaches for because it is one pass, O(n), and reads like the spec.

**Wrong: list(set(items))**

Dedupes correctly but a set has no order, so the returned IDs come back in hash order. On the member feed that scrambles the scan sequence the downstream join depends on.

**Right: list(dict.fromkeys(items))**

Dedupes AND keeps first-seen order, same O(n) cost, no extra structure to manage. The explicit seen-set loop is equivalent and just as acceptable if you prefer it spelled out.

> **Complexity**
>
> Time is O(n) for a single pass with O(1) average hashing per element; space is O(n) for the distinct keys. At a few million scans per shift this stays well within memory and runs in a single linear sweep.

> **Interviewers Watch For**
>
> Whether you instinctively avoid list(set(...)) the moment order matters, and whether you know dict.fromkeys exists. A candidate who writes the explicit seen-set loop and explains why the naive list-membership check is O(n^2) signals just as much seniority as the one-liner.

> **Common Pitfall**
>
> Reaching for list(set(items)) and assuming order is preserved. It often looks fine on tiny test inputs because small int sets happen to hash in ascending order, then silently reorders strings or larger inputs in production. The other miss is sorted(set(items)), which imposes an order the spec never asked for.

---

## Common follow-up questions

- What changes if you only want members scanned exactly once during the shift, dropping anyone who repeated? _(Tests counting (Counter) versus first-seen dedup; a different output set entirely.)_
- The feed no longer fits in memory and arrives as a stream. How do you dedupe it? _(Tests a growing seen-set memory bound, and when to reach for an approximate structure like a Bloom filter or external sort.)_
- Members are dicts, not hashable IDs. How would you dedupe on a chosen field? _(Tests keying the seen-set on a derived value (item['member_id']) since dicts are unhashable.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/first-seen-member-feed)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.