# The Weight of Every Purchase

> Each order carries everything that came before it. Show the load building, customer by customer.

Canonical URL: <https://datadriven.io/problems/the-weight-of-every-purchase>

Domain: SQL · Difficulty: medium · Seniority: mid

## Problem

The finance analytics group at our marketplace wants to understand how each shopper's lifetime spend accumulates over time, so they can spot the exact purchase where a customer crosses a high-value threshold and flag them for loyalty outreach. For every purchase a customer has made, they want to see the customer, the date that purchase happened, and how much that customer had spent in total up to and including that purchase, with the timeline read in the order the purchases actually occurred. Walk through one customer's history and the picture is a steadily climbing balance; do this for everyone at once.

## Worked solution and explanation

### Why this problem exists in real interviews

This is the canonical test of whether a candidate reaches for a window function instead of a self-join or a correlated subquery. The interviewer wants to see that you understand a running total is a per-partition accumulation along an ordering: here, the partition is user_id, the ordering is transaction_date, and the measure is total_amount. The give-away in the prose is 'up to and including that purchase' over each customer's own timeline, which is exactly the semantics of SUM() with a frame that runs from the start of the partition to the current row. Getting this right with one pass over transactions, rather than an O(n^2) join, is the signal they care about.

---

### Break down the requirements

#### Step 1: Accumulate within each customer

The total resets per customer, so the calculation is scoped to one shopper at a time. That maps to PARTITION BY user_id inside the window: each user's purchases are summed independently of everyone else's.

#### Step 2: Walk the timeline in order

'Up to and including that purchase' means the cumulative sum must follow chronological order, so ORDER BY transaction_date inside the window. Add transaction_id as a tiebreaker so two purchases on the same date produce a stable, deterministic running total.

#### Step 3: Frame from the beginning to here

SUM(total_amount) with the frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW gives the balance through the current purchase. Name the output running_total, and order the final result so each customer's climbing balance reads top to bottom.

---

### The solution

**Per-customer running total**

```sql
SELECT user_id, transaction_date, SUM(total_amount) OVER (PARTITION BY user_id ORDER BY transaction_date, transaction_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total FROM transactions ORDER BY user_id, transaction_date, transaction_id;
```

> **Cost Analysis**
>
> On a transactions table of ~80M rows / 22 GB, the window function needs a single sort by (user_id, transaction_date, transaction_id) followed by a streaming cumulative pass, so it is O(n log n) dominated by the sort. If the table is partitioned or clustered on user_id, the engine can compute each partition's running sum without a global sort, keeping memory bounded per partition. Compare this to the naive correlated-subquery approach, which is O(n^2) and melts at this scale.

> **Interviewers Watch For**
>
> A strong candidate names the explicit frame instead of relying on the default. The default frame with ORDER BY is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which lumps together all rows sharing the same transaction_date into one identical total. Specifying ROWS and a unique tiebreaker shows you understand the RANGE-versus-ROWS distinction and care about determinism.

> **Common Pitfall**
>
> Forgetting the tiebreaker. With ORDER BY transaction_date alone and a RANGE frame, every purchase a customer made on the same day reports the same running_total (the end-of-day balance), not the incremental one. Adding transaction_id to the window ORDER BY and using a ROWS frame fixes the ambiguity.

---

## Common follow-up questions

- How would you change the query to show, for each customer, only the single purchase at which their cumulative spend first crossed 1000? _(Tests wrapping the window result in a CTE and filtering on the running total plus its prior value, a common follow-on to running-total questions.)_
- Suppose you only wanted a 30-day trailing total instead of an all-time running total. What changes? _(Tests understanding of RANGE frames with INTERVAL bounds versus the unbounded-preceding ROWS frame used here.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the-weight-of-every-purchase)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.