# The Stream Averager

> The answer moves with the data.

Canonical URL: <https://datadriven.io/problems/the_stream_averager>

Domain: Python · Difficulty: easy · Seniority: L5

## Problem

Implement StreamAverager with add(key, value) and get_averages(). add records a reading; get_averages returns a dict mapping each key to the average of its values. Test harness: ['add', key, value] and ['get_averages']; returns parallel list (None for add, dict for get_averages).

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **class design** with **incremental aggregation**. Maintaining running averages per key without recomputing from scratch is a core streaming data pattern that probes stateful computation.

---

### Break down the requirements

#### Step 1: Store sum and count per key

Each key needs two accumulators: the running sum and the count of values added. The average is derived as `sum / count`.

#### Step 2: Implement add(key, value)

Update the sum and count for the given key. Initialize if the key has not been seen before.

#### Step 3: Implement get_averages()

Compute `sum / count` for each key and return the results as a dict.

---

### The solution

**Dual-dict accumulator with lazy average computation**

```python
class StreamAverager:
    def __init__(self):
        self.sums = {}
        self.counts = {}
    def add(self, key: str, value: float):
        self.sums[key] = self.sums.get(key, 0) + value
        self.counts[key] = self.counts.get(key, 0) + 1
    def get_averages(self) -> dict:
        averages = {}
        for key in self.sums:
            averages[key] = self.sums[key] / self.counts[key]
        return averages
```

> **Time and Space Complexity**
>
> **Time:** `add` is O(1). `get_averages` is O(k) where k is the number of distinct keys.
> 
> **Space:** O(k) for storing sums and counts per key.

> **Interviewers Watch For**
>
> Storing sum and count separately instead of a list of all values. Keeping all values would be O(n) space and O(n) per average computation.

> **Common Pitfall**
>
> Storing all values in a list per key and recomputing the mean each time. This is O(n) per `get_averages` call where n is total values, not O(k).

---

## Common follow-up questions

- How would you add a remove(key, value) method? _(Tests subtracting from sum and decrementing count, handling the zero-count edge case.)_
- What if you needed a windowed average (last N values per key)? _(Tests using a deque per key with a fixed max length.)_
- How would you make this thread-safe? _(Tests knowledge of locks or concurrent data structures for multi-threaded access.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_stream_averager)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.