# The Hourly Bucket

> Timestamps belong somewhere.

Canonical URL: <https://datadriven.io/problems/the_hourly_bucket>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given a list of event dicts (each with 'ts' in ISO format 'YYYY-MM-DDTHH:MM:SS' and 'type'), group by the hour prefix 'YYYY-MM-DDTHH' (first 13 characters of ts). Return a dict mapping each hour prefix to a list of 'type' values in original order.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **timestamp truncation and grouping**, a fundamental operation in log analysis and time-series dashboards. It probes string slicing for ISO timestamps and nested dict accumulation.

---

### Break down the requirements

#### Step 1: Extract the hour prefix from each timestamp

Slice the ISO timestamp `'YYYY-MM-DDTHH:MM:SS'` to `'YYYY-MM-DDTHH'` (first 13 characters).

#### Step 2: Group event types by hour

Build a dict mapping each hour prefix to a list of event type strings in original order.

---

### The solution

**Hour-prefix grouping with string slicing**

```python
def group_by_hour(events: list) -> dict:
    buckets = {}
    for event in events:
        hour_key = event['ts'][:13]
        event_type = event['type']
        if hour_key not in buckets:
            buckets[hour_key] = []
        buckets[hour_key].append(event_type)
    return buckets
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the number of events.
> 
> **Space:** O(n) for the grouped output.

> **Interviewers Watch For**
>
> Whether you use string slicing vs datetime parsing. For ISO format, slicing to `[:13]` is simpler and faster.

> **Common Pitfall**
>
> Slicing to the wrong position. The `T` separator is at index 10, so `[:13]` captures `YYYY-MM-DDTHH` correctly.

---

## Common follow-up questions

- What if you needed minute-level buckets? _(Tests slicing to `[:16]` for `YYYY-MM-DDTHH:MM`.)_
- What if events should be deduplicated within each bucket? _(Tests using a set instead of a list for each bucket.)_
- How would you stream this for a real-time dashboard? _(Tests maintaining rolling buckets and evicting old ones.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_hourly_bucket)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.