# Running Tab

> Every purchase adds to the total. Watch the tab grow.

Canonical URL: <https://datadriven.io/problems/running_tab>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Finance wants to see how each customer's spending accumulates over time. For every row in transactions, compute a per-user running sum of total_amount ordered by transaction_date. Return the user_id, transaction_date, total_amount, and the cumulative spending through that row.

## Worked solution and explanation

### Why this problem exists in real interviews

This problem targets filtering and projection across the `transactions` table. You need to work with columns like `user_id`, `total_amount`, and `transaction_date` to satisfy the requirements.

---

### Break down the requirements

#### Step 1: Compute the running aggregate

The window function computes an aggregate across an ordered set of rows without collapsing them. Each row retains its detail while gaining the cumulative metric.

#### Step 2: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Compute the running aggregate to find running tab**

```sql
SELECT user_id, transaction_date, total_amount, SUM(total_amount) OVER (PARTITION BY user_id ORDER BY transaction_date)
FROM transactions
```

> **Cost Analysis**
>
> With ~80M rows, the window function runs on the reduced set after filtering and grouping. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- What would happen to your result if `transactions.user_id` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `user_id` and uses DISTINCT or deduplication where needed.)_
- If `transactions` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `user_id`? _(Tests ability to identify performance hotspots related to `transactions.user_id` at scale.)_
- What is the default window frame for your window function, and would explicitly setting ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW change anything? _(Tests knowledge of implicit vs explicit window frame specifications.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/running_tab)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.