# Spending Range

> Between the smallest purchase and the biggest lies the story.

Canonical URL: <https://datadriven.io/problems/spending_range>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

Data science is modeling per-user spending variability and needs the minimum and maximum transaction amount for each user who has more than one transaction. Show the username, min amount, max amount, and the range, sorted from widest range to narrowest.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply grouped COUNT aggregation to the `users` and `transactions` tables, simulating a real user behavior workflow. Pay attention to columns like `user_id`, `username`, and `total_amount` as they drive the aggregation and output.

> **Trick to Solving**
>
> Complex queries with multiple transformation stages are best solved by writing one CTE per stage.
> 
> 1. Identify the intermediate results needed
> 2. Write each as a named CTE
> 3. Chain them: each CTE reads from the previous one
> 4. The final SELECT assembles the output

---

### Break down the requirements

#### Step 1: Isolate the intermediate result in a CTE

The `user_range` CTE computes the intermediate aggregation that the outer query builds on. This separation keeps each layer focused on a single task.

#### Step 2: Join `users` to `transactions`

The join connects the two tables on their shared key. This brings the columns needed for filtering and aggregation into a single row set.

#### Step 3: Apply the range filter

The WHERE clause restricts rows to the target range. Applying this filter early reduces the volume flowing into downstream operations.

---

### The solution

**Isolate the intermediate result in a cte to find spending range**

```sql
WITH user_range AS (
    SELECT u.username, MIN(t.total_amount) AS low, MAX(t.total_amount) AS high, COUNT(*) AS cnt
    FROM users u
    JOIN transactions t ON u.user_id = t.user_id
    GROUP BY u.username
)
SELECT username, low, high, high - low
FROM user_range
WHERE cnt > 1
ORDER BY high - low DESC
```

> **Cost Analysis**
>
> With ~132M rows, the GROUP BY reduces the working set before any downstream operations; the join cost depends on the smaller table's cardinality; CTEs materialize intermediate results, which can be beneficial or costly depending on the engine. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you decompose the problem into named, testable stages rather than nesting everything; whether you use a subquery or self-join, and can explain the tradeoffs.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- What would happen to your result if `transactions.user_id` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `user_id` and uses DISTINCT or deduplication where needed.)_
- If `users` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `username`? _(Tests ability to identify performance hotspots related to `users.username` at scale.)_
- Would materializing the CTE as a temp table improve performance when joining `users` and `transactions`? _(Tests understanding of CTE materialization behavior across different database engines.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/spending_range)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.