# The Event Bucketer

> Logs slotted into buckets.

Canonical URL: <https://datadriven.io/problems/the_event_bucketer>

Domain: Python · Difficulty: easy · Seniority: L5

## Problem

Given log tuples [timestamp_string, event_label] where timestamp is 'YYYY-MM-DD HH:MM:SS', bucket by hour using the first 13 characters ('YYYY-MM-DD HH'). Return a dict mapping each hour bucket to a dict mapping each event label to its count within that hour.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **two-level grouping**: bucketing events by hour, then counting by event type within each bucket. It probes nested dict accumulation and timestamp parsing skills.

---

### Break down the requirements

#### Step 1: Extract the hour prefix from each timestamp

Truncate the timestamp to `YYYY-MM-DDTHH` to create the bucket key.

#### Step 2: Group events by hour bucket

Build a dict mapping each hour to its list of events.

#### Step 3: Count event types within each bucket

For each hour, count occurrences of each event type.

---

### The solution

**Two-level grouping with hour truncation**

```python
def bucket_events(logs: list) -> dict:
    buckets = {}
    for log in logs:
        hour_key = log['timestamp'][:13]
        event_type = log['event_type']
        if hour_key not in buckets:
            buckets[hour_key] = {}
        if event_type not in buckets[hour_key]:
            buckets[hour_key][event_type] = 0
        buckets[hour_key][event_type] += 1
    return buckets
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the number of log entries.
> 
> **Space:** O(h * t) where h is the number of distinct hours and t is the number of distinct event types.

> **Interviewers Watch For**
>
> Whether you use string slicing (`[:13]`) vs datetime parsing for hour extraction. Slicing is simpler and faster for ISO timestamps.

> **Common Pitfall**
>
> Assuming timestamps are sorted. The grouping must work regardless of input order.

---

## Common follow-up questions

- How would you handle time zones? _(Tests converting to UTC before bucketing.)_
- What if you needed 15-minute buckets instead of hourly? _(Tests floor-division on the minute component.)_
- How would you stream this for real-time dashboards? _(Tests incremental updates vs batch recomputation.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_event_bucketer)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.