# The Consecutive Streak

> Login streaks. No gaps allowed.

Canonical URL: <https://datadriven.io/problems/the_consecutive_streak>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given a list of activity records (dicts with 'user_id' and 'date' in 'YYYY-MM-DD' format) and an integer min_streak (default 3), find users who have min_streak or more consecutive calendar days of activity. Deduplicate duplicate dates per user. Return the sorted list of qualifying user_ids.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **date-based grouping with deduplication and streak detection**, a common pattern in retention analytics. It probes whether a candidate can clean data, sort it per user, and detect consecutive day sequences.

---

### Break down the requirements

#### Step 1: Group activity dates by user

Build a dict mapping each user to their set of unique active dates. The set handles deduplication.

#### Step 2: Sort each user's dates

Consecutive detection requires chronological order.

#### Step 3: Find the longest streak per user

Walk the sorted dates and count consecutive days. Reset the counter when a gap appears.

#### Step 4: Filter users with streaks of 3 or more

Return only users whose longest streak meets the threshold.

---

### The solution

**Group, sort, and streak detection**

```python
from datetime import datetime, timedelta
def find_power_users(logs: list) -> list:
    user_dates = {}
    for log in logs:
        user = log['user_id']
        date = datetime.strptime(log['date'], '%Y-%m-%d').date()
        if user not in user_dates:
            user_dates[user] = set()
        user_dates[user].add(date)
    power_users = []
    for user, dates in user_dates.items():
        sorted_dates = sorted(dates)
        max_streak = 1
        current_streak = 1
        for i in range(1, len(sorted_dates)):
            if sorted_dates[i] - sorted_dates[i - 1] == timedelta(days=1):
                current_streak += 1
            else:
                current_streak = 1
            if current_streak > max_streak:
                max_streak = current_streak
        if max_streak >= 3:
            power_users.append(user)
    return power_users
```

> **Time and Space Complexity**
>
> **Time:** O(n log n) due to sorting each user's dates. The grouping pass is O(n).
> 
> **Space:** O(n) for the user-to-dates mapping.

> **Interviewers Watch For**
>
> Whether you deduplicate dates before streak detection. Duplicate entries on the same day should not inflate streaks.

> **Common Pitfall**
>
> Using string comparison for dates instead of proper date objects. String comparison of `'YYYY-MM-DD'` happens to work for ordering, but computing day differences requires actual date arithmetic.

---

## Common follow-up questions

- What if the threshold were configurable? _(Tests parameterization: pass the streak length as an argument.)_
- How would you also return the streak dates? _(Tests tracking the start index of the current streak.)_
- What if activity timestamps include hours and you need to bucket by date? _(Tests date truncation from datetime objects.)_
- How would this work on a billion-row event log? _(Tests partitioned processing or database-level window functions.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_consecutive_streak)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.