# The Intervals

> Timestamps in buckets.

Canonical URL: <https://datadriven.io/problems/the_intervals>

Domain: Python · Difficulty: medium · Seniority: L3

## Problem

Given a list of date strings in 'YYYY-MM-DD' format, bucket each date into a 7-day week starting from the earliest date. Week 1 covers days 0-6 after the min date; week 2 covers days 7-13; etc. Return a dict mapping the week number (as a string key) to the list of dates that fell in that week, preserving input order within each bucket.

## Worked solution and explanation

### Why this problem exists in real interviews

Time-bucketing is everywhere in analytics platforms: cohort retention, weekly active users, billing periods. Interviewers use this prompt to see whether you can pin down a window definition (anchor date, fixed length, label format) and apply it consistently. Most candidates flail because they invent a Monday-anchored ISO week instead of reading the spec, then fight off-by-one errors at the boundaries.

---

### Break down the requirements

#### Step 1: Establish the anchor and sort once

Week 1 starts at the earliest date in the input, not at an ISO week boundary. Sort the dates first so that bucket order is deterministic and you only compute the anchor once. Empty input returns an empty dict, not an error.

#### Step 2: Convert each date to a stable day index

Parse with datetime.strptime under the exact 'YYYY-MM-DD' contract, then use toordinal() to get an integer day count. Integer arithmetic on ordinals avoids timezone and DST traps that bite candidates who try timedelta math.

#### Step 3: Map the day offset to a 1-indexed week label

Compute (ordinal - first_ordinal) // 7 + 1 and stringify the week number. Use setdefault to append into the bucket list so the first occurrence creates the key and the rest extend it. Since you sorted up front, each bucket stays in chronological order for free.

---

### The solution

**Sort once, anchor on the min, integer-divide by 7**

```python
from datetime import datetime

def weekly_buckets(dates):
    if not dates:
        return {}
    parsed = sorted(dates)
    anchor = datetime.strptime(parsed[0], '%Y-%m-%d').toordinal()
    buckets = {}
    for d in parsed:
        day = datetime.strptime(d, '%Y-%m-%d').toordinal()
        week_num = (day - anchor) // 7 + 1
        buckets.setdefault(str(week_num), []).append(d)
    return buckets
```

> **Cost Analysis**
>
> Time is O(n log n) dominated by the initial sort; the bucketing pass is O(n). Space is O(n) for the output dict plus O(n) transient for the sorted list. Parsing each date once costs a constant per row, well within interview budgets even for tens of thousands of dates.

> **Interviewers Watch For**
>
> Whether you ask 'what defines week 1' before coding, whether you handle empty input without a special branch deeper in the loop, and whether your week labels are strings (the spec said so). Strong candidates also note that sorted input means each bucket is naturally chronological, so no second sort is needed.

> **Common Pitfall**
>
> Reaching for ISO calendar weeks (datetime.isocalendar) and getting Monday-anchored buckets that disagree with the spec at every boundary. The anchor is the earliest input date, not Sunday, not Monday, not January 1. The other classic miss is integer-keying the bucket dict; the spec wants string keys like '1', '2'.

---

## Common follow-up questions

- How would you handle dates that span multiple years or include leap days? _(ordinal arithmetic already handles this; explain why the // 7 calculation is calendar-agnostic.)_
- What changes if buckets must be calendar weeks (Sunday to Saturday) instead of anchored on the min date? _(you would shift the anchor to the most recent Sunday on or before parsed[0]; show the (weekday() + 1) % 7 trick.)_
- How would you stream this over a billion rows where you cannot sort upfront? _(discuss a fixed anchor passed in as a parameter, single-pass bucketing, and external sort or hash partitioning by week.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_intervals)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.