# The Repeat Visitor

> Loyal customers come back sooner than expected.

Canonical URL: <https://datadriven.io/problems/the_repeat_visitor>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given a list of purchase records (dicts with 'user' and 'date' in 'YYYY-MM-DD' format) and integer days, return users who made at least two purchases within a rolling window: there exist two of their purchase dates d1 <= d2 with (d2 - d1) days <= days. Return the list of user IDs sorted alphabetically.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **grouping, sorting within groups, and sliding window logic on dates**. It combines multiple skills: group-by user, sort by date, and check pairwise proximity within the sorted group.

---

### Break down the requirements

#### Step 1: Group purchases by user ID

Build a dictionary mapping each user to their list of purchase dates.

#### Step 2: Sort each user's dates

Sorted dates allow checking consecutive pairs for proximity.

#### Step 3: Check consecutive dates within the window

For each pair of consecutive dates, check if the difference is within N days. If any pair qualifies, the user is a repeat visitor.

---

### The solution

**Group by user, sort dates, check consecutive pairs**

```python
from datetime import datetime
def repeat_visitors(purchases, n_days):
    user_dates = {}
    for p in purchases:
        uid = p['user_id']
        dt = datetime.strptime(p['date'], '%Y-%m-%d')
        if uid not in user_dates:
            user_dates[uid] = []
        user_dates[uid].append(dt)
    result = []
    for uid in user_dates:
        dates = user_dates[uid]
        dates.sort()
        for i in range(1, len(dates)):
            diff = (dates[i] - dates[i - 1]).days
            if diff <= n_days:
                result.append(uid)
                break
    result.sort()
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n log n) overall, dominated by sorting each user's date list.
> 
> **Space:** O(n) for the grouped data.

> **Interviewers Watch For**
>
> Do you check consecutive sorted dates rather than all pairs? Sorting first means consecutive dates have the smallest gaps, so if any pair within the window exists, consecutive pairs will find it.

> **Common Pitfall**
>
> Comparing every pair of dates for a user, giving O(k^2) per user. Sorting and checking consecutive pairs is O(k log k) per user.

---

## Common follow-up questions

- What if the rolling window were in hours instead of days? _(Tests switching to datetime with time components and timedelta hours.)_
- What if you needed to count how many repeat visits each user made? _(Tests counting all qualifying pairs instead of just detecting one.)_
- How would you do this in SQL? _(Tests window functions with LAG or self-join with date range condition.)_
- What if purchase records were streaming in real time? _(Tests maintaining per-user date buffers with expiration.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_repeat_visitor)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.