# The Field Counter

> Some fields speak louder than others.

Canonical URL: <https://datadriven.io/problems/the_field_counter>

Domain: Python · Difficulty: easy · Seniority: L4

## Problem

Given a list of dicts (records) and a key name, return a dict mapping each distinct value found at that key to the number of records containing that value. Records that do not have the key must be skipped.

## Worked solution and explanation

### Why this problem exists in real interviews

Counting occurrences in a stream of records is the most common shape of data work. Interviewers use this prompt to check whether you skip records that lack the key (instead of letting `KeyError` blow up), whether you reach for `dict.get` or `defaultdict` or `Counter`, and whether your output type matches what the harness expects.

---

### Break down the requirements

#### Step 1: Skip records that lack the key

The spec says 'records that do not have the key must be skipped.' Use `if key in r:` (or wrap in `try/except KeyError`) before reading the value. Indexing blindly with `r[key]` will crash on the first missing record.

#### Step 2: Tally with `dict.get` or `defaultdict(int)`

`counts[v] = counts.get(v, 0) + 1` works without an import. `collections.defaultdict(int)` and `collections.Counter` are both fine; just remember the harness expects a plain `dict` back, so cast if needed.

#### Step 3: Return the counts dict directly

Keys are the distinct values found at `key`; values are the integer counts. No sorting is required. Do not return a list of tuples or a `Counter` object if the harness compares against `dict`.

---

### The solution

**Linear scan with key-presence guard**

```python
def count_key_occurrences(records, key):
    counts = {}
    for r in records:
        if key in r:
            v = r[key]
            counts[v] = counts.get(v, 0) + 1
    return counts
```

> **Cost Analysis**
>
> Time: O(N) for N records, one dict lookup and one increment each. Space: O(D) where D is the number of distinct values found at `key`.

> **Interviewers Watch For**
>
> Whether you handle missing keys gracefully (the spec calls it out explicitly), whether you use `dict.get`/`defaultdict`/`Counter` instead of an `if v in counts: ... else: ...` block, and whether you return the right shape. Strong candidates ask whether `None` values count as a distinct value.

> **Common Pitfall**
>
> Writing `counts[r[key]] += 1` with no guard. The first record that lacks `key` raises `KeyError`, and even when the key is present, the increment raises `KeyError` on the first sighting of a new value because the entry does not exist yet.

---

## Common follow-up questions

- How would you change the function to count by multiple keys at once and return a nested dict? _(Tests structure design. The candidate should propose `{key_name: {value: count}}` and a single pass that updates all keys per record.)_
- What if records are coming from a generator that may yield millions of items? Would you change anything? _(Tests memory awareness. The function already streams since it iterates `records` once. The main risk is the output dict growing unbounded if cardinality is high.)_
- How would you also return the most common value and its count? _(Tests post-processing. `max(counts.items(), key=lambda kv: kv[1])` is the natural answer; `Counter.most_common(1)` is the idiomatic one if they used `Counter`.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_field_counter)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.