# Trend Spotter

> What did they spend last time? Context changes everything.

Canonical URL: <https://datadriven.io/problems/trend_spotter>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Data science wants to model spending trends per user over time. For every row in transactions, pull that user's previous transaction amount (by transaction_date) onto the same row, leaving the value empty on a user's first transaction. Return the user_id, total_amount, transaction_date, and the previous amount.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests the `LAG` window function, which is core to time-series analysis. Interviewers check whether you can access a previous row's value partitioned by a grouping key and ordered by time.

> **Trick to Solving**
>
> "Previous transaction amount" is the direct signal for `LAG`. The partition is by user, and the ordering is by date.
> 
> 1. Identify the partition key (user_id)
> 2. Identify the ordering column (transaction_date)
> 3. Apply `LAG(total_amount) OVER (PARTITION BY user_id ORDER BY transaction_date)`

---

### Break down the requirements

#### Step 1: Apply LAG for the previous amount

`LAG(total_amount) OVER (PARTITION BY user_id ORDER BY transaction_date)` retrieves the prior transaction amount for each user.

#### Step 2: Handle first transactions

For each user's earliest transaction, LAG returns NULL, which satisfies the "leaving the value empty" requirement.

---

### The solution

**LAG to access previous row per user**

```sql
SELECT
    user_id,
    total_amount,
    transaction_date,
    LAG(total_amount) OVER (
        PARTITION BY user_id
        ORDER BY transaction_date
    ) AS prev_amount
FROM transactions
```

> **Cost Analysis**
>
> The window function sorts 80M rows within 4M partitions (~20 rows per user on average). The sort per partition is small. The dominant cost is the full table scan plus the overall partitioning overhead.

> **Interviewers Watch For**
>
> Candidates who try to self-join transactions to get the previous row. While correct, it is significantly less efficient and harder to read than `LAG`.

> **Common Pitfall**
>
> Using `LEAD` instead of `LAG`. `LEAD` looks forward (next row), `LAG` looks backward (previous row). The prompt asks for the previous transaction.

---

## Common follow-up questions

- How would you compute the change in amount from one transaction to the next? _(Subtract the LAG value: `total_amount - LAG(total_amount) OVER (...)`.)_
- What if two transactions have the same date for a user? _(The ordering is non-deterministic; add transaction_id as a tiebreaker.)_
- How would you look back 2 transactions instead of 1? _(Use `LAG(total_amount, 2)` for a 2-row offset.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/trend_spotter)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.