# Revenue for Specific Users

> Alice and bob. Total spend.

Canonical URL: <https://datadriven.io/problems/revenue_for_specific_users>

Domain: SQL · Difficulty: easy · Seniority: L4

## Problem

We're reconciling revenue for alice and aaron42. Show each transaction row alongside a running total of revenue across both users. Return the transaction ID, user, transaction amount, and running total.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this revenue analysis scenario to test self-join against the `transactions` and `users` tables. The focus is on how you handle columns like `transaction_id`, `user_id`, and `total_amount` when building the result.

---

### Break down the requirements

#### Step 1: Join `transactions` to `users`

The join connects the two tables on their shared key. This brings the columns needed for filtering and aggregation into a single row set.

#### Step 2: Compute the running aggregate

The window function computes an aggregate across an ordered set of rows without collapsing them. Each row retains its detail while gaining the cumulative metric.

---

### The solution

**Join `transactions` to `users` to find revenue for specific users**

```sql
SELECT t.transaction_id, t.user_id, t.total_amount,
       SUM(t.total_amount) OVER (ORDER BY t.transaction_id) AS running_total
FROM transactions t
INNER JOIN users u ON t.user_id = u.user_id
WHERE u.username IN ('user002', 'user003')
```

> **Cost Analysis**
>
> With ~72M rows, the window function runs on the reduced set after filtering and grouping; the join cost depends on the smaller table's cardinality. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- What would happen to your result if `users.user_id` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `user_id` and uses DISTINCT or deduplication where needed.)_
- If `transactions` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `user_id`? _(Tests ability to identify performance hotspots related to `transactions.user_id` at scale.)_
- What is the default window frame for your window function, and would explicitly setting ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW change anything? _(Tests knowledge of implicit vs explicit window frame specifications.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/revenue_for_specific_users)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.