# Monthly Running Total

> Cumulative sales per product across months.

Canonical URL: <https://datadriven.io/problems/monthly_running_total>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

For each product, show the monthly sales total alongside a running cumulative total that accumulates across months, by product then month.

## Worked solution and explanation

### Why this problem exists in real interviews

Querying transactions for total_amount data using grouping and running total tests whether you can translate a business requirement into the right column references and filter sequence. It shows up in mid-level screens to verify practical fluency.

> **Trick to Solving**
>
> Running totals use a window function with an unbounded preceding frame.
> 
> 1. Use `SUM(...) OVER (ORDER BY ... ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`
> 2. Choose `ROWS` over `RANGE` to avoid grouping ties
> 3. Ensure the ORDER BY produces a deterministic sequence

---

### Break down the requirements

#### Step 1: Apply the window function

Use the window function with the correct `PARTITION BY` and `ORDER BY` to compute the required metric per group.

#### Step 2: Aggregate with SUM

Group by the output grain and apply `SUM()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 3: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**Nested SUM with cumulative window frame**

```sql
SELECT product_id,
    STRFTIME('%Y-%m', transaction_date) AS month,
    SUM(total_amount) AS monthly_total,
    SUM(SUM(total_amount)) OVER (PARTITION BY product_id ORDER BY STRFTIME('%Y-%m', transaction_date)
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_total
FROM transactions
GROUP BY product_id, STRFTIME('%Y-%m', transaction_date)
ORDER BY product_id, month
```

> **Cost Analysis**
>
> The query scans 100M rows from `transactions`. The window function requires a sort, which is O(n log n). Pre-aggregating reduces the sort input. The aggregation reduces the row count before any downstream processing, which is the key performance lever. CTEs in most engines are optimization fences. For production workloads, consider inlining or materializing the intermediate results.

> **Interviewers Watch For**
>
> Strong candidates explain their choice of window function (`ROW_NUMBER` vs `RANK` vs `DENSE_RANK`) and why it matches the tie semantics. Interviewers expect you to articulate why you chose a specific join type and what happens to unmatched rows.

> **Common Pitfall**
>
> Forgetting that a JOIN can multiply rows when the relationship is one-to-many. Always check whether the join key is unique on at least one side.

---

## Common follow-up questions

- If transactions.transaction_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in transaction_id.)_
- How would you verify that your aggregation on transactions.transaction_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in transactions.transaction_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like transaction_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/monthly_running_total)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.