# Cumulative Sales Per Customer

> Each purchase adds to the running total. Watch it climb.

Canonical URL: <https://datadriven.io/problems/cumulative_sales_per_customer>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The finance team wants to track each customer's spending trajectory over time. Show every transaction alongside the customer's cumulative total spend up to and including that row, ordered by transaction date.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests running totals with `SUM() OVER (PARTITION BY ... ORDER BY ...)`. It probes whether you can compute a cumulative aggregate alongside row-level detail.

---

### Break down the requirements

#### Step 1: Select transaction details

Return `transaction_id`, `user_id`, `total_amount`, and `transaction_date`.

#### Step 2: Add cumulative total

`SUM(total_amount) OVER (PARTITION BY user_id ORDER BY transaction_date)` computes the running spend per customer.

#### Step 3: Sort by transaction date

`ORDER BY transaction_date` as specified.

---

### The solution

**Partitioned running total**

```sql
SELECT
    transaction_id,
    user_id,
    total_amount,
    transaction_date,
    SUM(total_amount) OVER (PARTITION BY user_id ORDER BY transaction_date) AS cumulative_spend
FROM transactions
ORDER BY transaction_date
```

> **Cost Analysis**
>
> Window function over 250M rows partitioned by user_id. Each partition is sorted by transaction_date. With millions of users, the sort is distributed across partitions. The final ORDER BY transaction_date requires a global sort on the full output.

> **Interviewers Watch For**
>
> Whether the candidate recognizes that the default window frame with ORDER BY is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which groups ties. For transaction data with unique dates per user, this is typically fine.

> **Common Pitfall**
>
> If a customer has multiple transactions on the same date, the RANGE frame groups them together, making the cumulative total jump. Use ROWS instead if you want transaction-by-transaction progression.

---

## Common follow-up questions

- How would you handle multiple transactions on the same date for one user? _(Tests ROWS vs RANGE frame semantics.)_
- What if you needed the cumulative total as a percentage of their lifetime total? _(Divide by SUM(total_amount) OVER (PARTITION BY user_id) to get percentage.)_
- How would this perform on 250M rows? _(The partitioned sort is the bottleneck. Discusses work_mem tuning and parallel query execution.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/cumulative_sales_per_customer)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.