# Group Average

> Same group, different values. What is typical?

Canonical URL: <https://datadriven.io/problems/group_average>

Domain: Python · Difficulty: hard · Seniority: L4

## Problem

Given a list of [key, value] pairs, return a dict mapping each key to the average of its values.

## Worked solution and explanation

### Why this problem exists in real interviews

Grouped averaging is a **GROUP BY with AVG** in Python form. It tests whether you can accumulate both sums and counts per key and then compute the final averages in a second pass.

---

### Break down the requirements

#### Step 1: Accumulate sum and count per key

For each [key, value] pair, add the value to the running sum for that key and increment the count.

#### Step 2: Compute averages

Divide each key's total by its count to get the mean.

---

### The solution

**Two-pass grouped aggregation: accumulate then divide**

```python
def group_averages(data):
    sums = {}
    counts = {}
    for pair in data:
        key = pair[0]
        value = pair[1]
        if key not in sums:
            sums[key] = 0
            counts[key] = 0
        sums[key] += value
        counts[key] += 1
    result = {}
    for key in sums:
        result[key] = sums[key] / counts[key]
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the number of pairs. One pass to accumulate, one pass over unique keys to divide.
> 
> **Space:** O(k) where k is the number of unique keys.

> **Interviewers Watch For**
>
> Tracking both sums and counts separately rather than storing all values in a list per key. Storing lists then averaging works but uses O(n) space instead of O(k).

> **Common Pitfall**
>
> Integer division. Using `//` instead of `/` would truncate the result. Python 3's `/` returns a float, which is the correct behavior here.

---

## Common follow-up questions

- What if you also needed the min and max per group? _(Tests extending the accumulator to track four values per key.)_
- How would you handle groups with zero values? _(Tests that division by zero cannot happen since each group has at least one entry.)_
- What if the data arrives as a stream and groups are not contiguous? _(Tests online averaging: maintain running sum and count, emitting results when the stream ends.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/group_average)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.