# The Event Aggregator

> Bucket a firehose of events into tidy time windows.

Canonical URL: <https://datadriven.io/problems/the_event_aggregator>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given a list of event dicts (each with 'timestamp' and 'value') and an integer bucket_width, group events into buckets where each bucket covers timestamps in [bucket_start, bucket_start + bucket_width). For each bucket with at least one event, return a dict with 'bucket_start', 'count', 'total' (sum of values). Sort the output by bucket_start ascending.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **time-window bucketing and aggregation**, a core pattern in streaming analytics and dashboard pipelines. It probes whether a candidate can map timestamps to fixed-width windows and aggregate values within each bucket.

---

### Break down the requirements

#### Step 1: Compute the window key for each event

Integer-divide the timestamp by the window width to assign each event to a bucket.

#### Step 2: Aggregate count and total value per window

For each bucket, track both the number of events and the sum of values.

#### Step 3: Return the aggregated windows

Output should map each window start timestamp to its count and total.

---

### The solution

**Floor-division bucketing with dual aggregation**

```python
def aggregate_events(events: list, window_size: int) -> dict:
    buckets = {}
    for event in events:
        ts = event['timestamp']
        bucket_start = (ts // window_size) * window_size
        if bucket_start not in buckets:
            buckets[bucket_start] = {'count': 0, 'total': 0}
        buckets[bucket_start]['count'] += 1
        buckets[bucket_start]['total'] += event['value']
    return buckets
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the number of events. Each event is bucketed in O(1).
> 
> **Space:** O(w) where w is the number of distinct windows.

> **Interviewers Watch For**
>
> Whether you use floor division (`//`) correctly to compute window boundaries. This is the same technique used in Kafka Streams and Flink window functions.

> **Common Pitfall**
>
> Using modulo instead of floor division. Modulo gives the offset within the window, not the window start.

---

## Common follow-up questions

- What if events arrive out of order? _(Tests late-arriving event handling and watermark concepts from streaming systems.)_
- How would you implement sliding windows instead of tumbling? _(Tests overlapping window assignment where each event belongs to multiple windows.)_
- What if you need to emit results as events arrive? _(Tests incremental aggregation vs batch computation.)_
- How does this pattern map to SQL window functions? _(Tests `FLOOR(timestamp / width) * width` as the grouping key.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_event_aggregator)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.