# Repeat Purchases Within a Week

> They bought again within seven days.

Canonical URL: <https://datadriven.io/problems/repeat_purchases_within_a_week>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

Given a table of customer transactions, identify users who made at least two purchases within a 7-day window. Return each qualifying user ID once.

## Worked solution and explanation

### Why this problem exists in real interviews

Drawn from a purchase behavior domain, this question centers on self-join over the `transactions` table. The tricky part is handling columns like `transaction_id`, `user_id`, and `transaction_date` correctly under the given constraints.

> **Trick to Solving**
>
> When the prompt asks for pairs or combinations within the same table, a self-join is the pattern.
> 
> 1. Alias the table twice (e.g., `a` and `b`)
> 2. Join on the shared attribute (region, user, etc.)
> 3. Add `a.id < b.id` to avoid duplicate and self-pairs

---

### Break down the requirements

#### Step 1: Self-join the table to pair rows

Join `transactions` to itself to compare rows within the same table. The join condition controls which pairs are valid and prevents duplicate mirrors.

#### Step 2: Apply the WHERE filter

Filter rows before any aggregation. This ensures only qualifying data enters the computation, keeping the result correct and the scan minimal.

#### Step 3: Deduplicate the result with DISTINCT

`SELECT DISTINCT` removes duplicate rows from the output. This is necessary when joins or subqueries can produce repeated combinations.

---

### The solution

**Self-join the table to pair rows to find repeat purchases within a ...**

```sql
SELECT DISTINCT t1.user_id
FROM transactions t1
JOIN transactions t2 ON t1.user_id = t2.user_id AND t1.transaction_id < t2.transaction_id
WHERE ABS(julianday(t2.transaction_date) - julianday(t1.transaction_date)) <= 7
```

> **Cost Analysis**
>
> With ~100M rows, the join cost depends on the smaller table's cardinality. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for how you prevent duplicate pairs and ensure the join condition is correct; how you handle date arithmetic and whether you account for edge cases like month boundaries.

> **Common Pitfall**
>
> Using string comparison instead of proper date arithmetic for date ranges can miss edge cases at midnight boundaries.

---

## Common follow-up questions

- What would happen to your result if `transactions.user_id` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `user_id` and uses DISTINCT or deduplication where needed.)_
- With 5,000,000 distinct values in `transactions.user_id`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `user_id` affects grouping and sort operations.)_
- If this query ran as a scheduled job, how would you add monitoring to detect when the result set is suspiciously empty? _(Tests operational awareness around scheduled query jobs.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/repeat_purchases_within_a_week)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.