# The Shifting Standard

> A benchmark in motion.

Canonical URL: <https://datadriven.io/problems/the_shifting_standard>

Domain: Python · Difficulty: medium · Seniority: L5

## Problem

Given a list of batches (each a list of numbers), after processing each batch compute the running average of ALL numbers across ALL batches seen so far. Return the list of running averages (one per batch).

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **running aggregation** across multiple batches, a pattern that appears constantly in streaming data pipelines. It probes whether a candidate maintains cumulative state correctly without recomputing from scratch each time.

---

### Break down the requirements

#### Step 1: Maintain running sum and count across batches

Instead of flattening all data and recomputing, keep a cumulative sum and count that update as each batch arrives.

#### Step 2: After each batch, compute and store the running average

Divide cumulative sum by cumulative count. Append each average to the result list.

---

### The solution

**Cumulative sum and count across batches**

```python
def running_averages(batches: list) -> list:
    total_sum = 0
    total_count = 0
    averages = []
    for batch in batches:
        for value in batch:
            total_sum += value
            total_count += 1
        avg = total_sum / total_count
        averages.append(avg)
    return averages
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the total number of elements across all batches. Each element is visited exactly once.
> 
> **Space:** O(b) where b is the number of batches, for the result list. The running state uses O(1) additional space.

> **Interviewers Watch For**
>
> Candidates who recompute the average from scratch each time (flattening all previous batches) show they are not thinking about streaming patterns. The O(1) state update is the key signal.

> **Common Pitfall**
>
> Floating-point precision drift over many batches. For production systems, you might use `Decimal` or track sum and count separately, computing the ratio only when needed.

---

## Common follow-up questions

- What if batches could be empty? _(Tests guarding against division by zero when total_count is still 0.)_
- How would you extend this to compute a running median? _(Tests knowledge of heaps or sorted containers for O(log n) median maintenance.)_
- What if you needed to support removing a batch from the running average? _(Tests tracking per-batch contributions for reversible aggregation.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_shifting_standard)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.