# Spend and Rank

> Five thrones at the top of the spending leaderboard.

Canonical URL: <https://datadriven.io/problems/spend_and_rank>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

The executive team wants a leaderboard of the 5 biggest spenders for the quarterly review. Sum spending per user, assign positions starting at 1 for the biggest spender with ties sharing a position, and keep only users in the top 5 positions. Return the username, total spend, and position.

## Worked solution and explanation

### Why this problem exists in real interviews

This user behavior problem uses the `users` and `transactions` tables to evaluate grouped SUM aggregation. Watch how columns like `user_id`, `username`, and `total_amount` interact in the grouping and filtering logic.

> **Trick to Solving**
>
> When the prompt mentions 'top N' or 'rank', check whether ties should be included. If yes, use `DENSE_RANK()` instead of `ROW_NUMBER()` or `LIMIT`.
> 
> 1. Spot tie-inclusion language in the prompt
> 2. Aggregate to the correct grain first
> 3. Apply `DENSE_RANK()` over the aggregated result

---

### Break down the requirements

#### Step 1: Isolate the intermediate result in a CTE

The `ranked` CTE computes the intermediate aggregation that the outer query builds on. This separation keeps each layer focused on a single task.

#### Step 2: Join `users` to `transactions`

The join connects the two tables on their shared key. This brings the columns needed for filtering and aggregation into a single row set.

#### Step 3: Apply the range filter

The WHERE clause restricts rows to the target range. Applying this filter early reduces the volume flowing into downstream operations.

---

### The solution

**Isolate the intermediate result in a cte to find spend and rank**

```sql
WITH ranked AS (
    SELECT u.username, SUM(t.total_amount) AS total, RANK() OVER (ORDER BY SUM(t.total_amount) DESC) AS rnk
    FROM users u
    JOIN transactions t ON u.user_id = t.user_id
    GROUP BY u.username
)
SELECT username, total, rnk
FROM ranked
WHERE rnk <= 5
```

> **Cost Analysis**
>
> With ~165M rows, the GROUP BY reduces the working set before any downstream operations; the window function runs on the reduced set after filtering and grouping; the join cost depends on the smaller table's cardinality; CTEs materialize intermediate results, which can be beneficial or costly depending on the engine. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for which ranking function you choose and whether you can explain the difference between `ROW_NUMBER`, `RANK`, and `DENSE_RANK`; whether you decompose the problem into named, testable stages rather than nesting everything; whether you use a subquery or self-join, and can explain the tradeoffs.

> **Common Pitfall**
>
> Using `ROW_NUMBER()` when the prompt requires tie inclusion silently drops tied rows. Read the prompt for language about ties or 'include all'.

---

## Common follow-up questions

- What would happen to your result if `transactions.total_amount` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `total_amount` and uses DISTINCT or deduplication where needed.)_
- If `transactions` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `transaction_id`? _(Tests ability to identify performance hotspots related to `transactions.transaction_id` at scale.)_
- What is the default window frame for your window function, and would explicitly setting ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW change anything? _(Tests knowledge of implicit vs explicit window frame specifications.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/spend_and_rank)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.