# Repeated Transactions

> Detect same amount transactions within 10 minutes.

Canonical URL: <https://datadriven.io/problems/repeated_transactions>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

A fraud model flags duplicate charges. In the transactions table, for each user and product combination ordered by time, compare each transaction to the one immediately before it. If the amounts match and the time gap is 10 minutes or less, flag it as a repeated transaction. Return the count of such repeated transactions.

## Worked solution and explanation

### Why this problem exists in real interviews

This problem targets LAG for prior-row comparison across the `transactions` table. You need to work with columns like `user_id`, `product_id`, and `total_amount` to satisfy the requirements.

> **Trick to Solving**
>
> Whenever the prompt asks you to compare a row to its predecessor or successor, that is a `LAG`/`LEAD` signal.
> 
> 1. Identify the comparison direction (previous vs. next)
> 2. Partition by the grouping key (e.g., user, service)
> 3. Order by the time or sequence column
> 4. Compute the difference in the outer query

---

### Break down the requirements

#### Step 1: Isolate the intermediate result in a CTE

The `lagged` CTE computes the intermediate aggregation that the outer query builds on. This separation keeps each layer focused on a single task.

#### Step 2: Select the target columns

The SELECT clause picks exactly the columns the prompt asks for. Returning extra columns or missing a required alias would fail the grading check.

---

### The solution

**Lag-compare for repeated transactions**

```sql
WITH lagged AS (
    SELECT *, LAG(total_amount) OVER (PARTITION BY user_id, product_id ORDER BY transaction_date) AS prev_amount, LAG(transaction_date) OVER (PARTITION BY user_id, product_id ORDER BY transaction_date) AS prev_date
    FROM transactions
)
SELECT COUNT(*) AS repeated_count
FROM lagged
WHERE total_amount = prev_amount AND (julianday(transaction_date) - julianday(prev_date)) * 24 * 60 <= 10
```

> **Cost Analysis**
>
> With ~80M rows, the window function runs on the reduced set after filtering and grouping; CTEs materialize intermediate results, which can be beneficial or costly depending on the engine. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you decompose the problem into named, testable stages rather than nesting everything; whether you reach for window functions or attempt a self-join for row-to-row comparison; how you handle date arithmetic and whether you account for edge cases like month boundaries.

> **Common Pitfall**
>
> Using string comparison instead of proper date arithmetic for date ranges can miss edge cases at midnight boundaries.

---

## Common follow-up questions

- What would happen to your result if `transactions.transaction_id` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `transaction_id` and uses DISTINCT or deduplication where needed.)_
- `transactions.transaction_id` has roughly 80,000,000 distinct values. What index strategy would you use to avoid a full scan on `transactions`? _(Tests indexing knowledge specific to the high-cardinality `transaction_id` column in `transactions`.)_
- Your query uses LAG to compare adjacent rows. What happens if the partition has only one row, and how would you handle that? _(Tests edge-case handling when LAG returns NULL for partitions with insufficient rows.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/repeated_transactions)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.