# Top Spender

> When your spending exceeds the priciest item on the shelf.

Canonical URL: <https://datadriven.io/problems/top_spender>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The VIP team wants customers whose lifetime spending outstrips the price of the single most expensive product. Join users to transactions, sum spending per user, and keep only those whose total exceeds the maximum product price. Return the username and their total spend.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes multi-table joins, aggregation with `HAVING`, and scalar subqueries. Interviewers want to see that you can combine a join, a group aggregate, and a comparison to a derived value cleanly.

---

### Break down the requirements

#### Step 1: Join users to transactions

`JOIN transactions ON users.user_id = transactions.user_id` connects each user to their purchase history.

#### Step 2: Aggregate total spend per user

`GROUP BY users.user_id, username` with `SUM(CAST(total_amount AS DOUBLE))` computes lifetime spending. Note that `total_amount` is TEXT type, so casting is required.

#### Step 3: Compare against max product price

`HAVING SUM(...) > (SELECT MAX(CAST(price AS DOUBLE)) FROM products)` uses a scalar subquery to find the ceiling price, then filters users whose total exceeds it.

---

### The solution

**Join, aggregate, and filter against scalar subquery**

```sql
SELECT u.username, SUM(CAST(t.total_amount AS DOUBLE)) AS total_spend
FROM users u
JOIN transactions t ON u.user_id = t.user_id
GROUP BY u.user_id, u.username
HAVING SUM(CAST(t.total_amount AS DOUBLE)) > (
    SELECT MAX(CAST(price AS DOUBLE)) FROM products
)
```

> **Cost Analysis**
>
> The scalar subquery `MAX(price)` executes once and returns a single value. The main query scans 100M transactions joined to 12M users. An index on `transactions(user_id)` is critical for the join performance.

> **Interviewers Watch For**
>
> Candidates who place the max-price check in a `WHERE` clause instead of `HAVING`. Since the comparison is against an aggregate (`SUM`), it must be in `HAVING`.

> **Common Pitfall**
>
> Forgetting to cast TEXT columns to numeric types. The `total_amount` and `price` columns are TEXT, so string comparison instead of numeric comparison would produce incorrect results.

---

## Common follow-up questions

- What if you needed to show which product's price they exceeded? _(Tests whether to join or use a scalar subquery to surface the max-price product.)_
- How would this query change if a user could have refunds (negative amounts)? _(Tests awareness that SUM handles negatives, and whether net spend is the correct metric.)_
- What if the products table had millions of rows? _(The scalar subquery is still O(n) but executes once; discuss indexing on price.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_spender)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.