# Where the Money Pools

> Every region has one line item that dwarfs the rest. Find it.

Canonical URL: <https://datadriven.io/problems/where-the-money-pools>

Domain: SQL · Difficulty: medium · Seniority: mid

## Problem

Finance is reviewing cloud spend region by region and wants to know where the money concentrates. For each region, find the single service responsible for the most spend along with that total, biggest spender first.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a top-1-per-group problem wearing a cost-governance costume. The skill being probed: can you collapse cloud_costs to one row per (region, svc_name) and then pick the single heaviest service WITHIN each region, without bleeding rows across regions? Anyone can write the SUM. The separator is recognizing you need a per-region rank, not a global one. The trap is doing the aggregation and the picking in one shot with a HAVING or a global ORDER BY LIMIT, which silently returns one row total instead of one row per region.

---

### Break down the requirements

#### Step 1: Roll spend up to service grain

GROUP BY region, svc_name with SUM(amount). Raw cloud_costs has many rows per service (one per bill line), so you must collapse to one total per service before you can compare them. Do this in a CTE so the ranking step has a clean grain to sit on.

#### Step 2: Rank within each region, not globally

ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_spend DESC). The PARTITION BY restarts the numbering for every region, so rn = 1 marks each region's top service. Drop the partition and you get one global winner, which answers a different question.

#### Step 3: Keep the winners and sort

Filter WHERE rn = 1 to keep one row per region, then ORDER BY total_spend DESC so the biggest-spending region surfaces first, as the prompt asks.

---

### The solution

**Top service per region by spend**

```sql
WITH svc_totals AS (
  SELECT region, svc_name, SUM(amount) AS total_spend
  FROM cloud_costs
  GROUP BY region, svc_name
),
ranked AS (
  SELECT region, svc_name, total_spend,
         ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_spend DESC) AS rn
  FROM svc_totals
)
SELECT region, svc_name, total_spend
FROM ranked
WHERE rn = 1
ORDER BY total_spend DESC;
```

> **Cost Analysis**
>
> On a 200M-row billing table (~20 GB) partitioned by bill_date, the GROUP BY is a single hash-aggregate that collapses to a few thousand (region, service) buckets. The window pass then operates on that tiny intermediate, not the base table, so the sort inside ROW_NUMBER is cheap. No self-join, no second scan of cloud_costs. Add a bill_date filter and partition pruning shrinks the only expensive step further.

> **Interviewers Watch For**
>
> Whether you reach for a window function partitioned by region instead of a correlated subquery or a self-join against per-region maxes. Stating out loud how you would handle a tie (two services with identical totals) signals seniority: ROW_NUMBER picks one arbitrarily, RANK keeps both.

> **Common Pitfall**
>
> Writing SELECT region, svc_name, SUM(amount) ... GROUP BY region, svc_name ORDER BY total_spend DESC LIMIT 1. That returns the single global top service, not one per region. The per-region requirement demands a partitioned rank; a global LIMIT cannot express it.

**ROW_NUMBER**

Returns exactly one service per region even when two tie on total_spend. Use when the consumer wants a clean one-row-per-region table.

**RANK**

Returns both tied services (two rows with rn = 1) for that region. Use when a genuine tie should surface both winners rather than hide one.

---

## Common follow-up questions

- How would you return the top three services per region instead of just the top one? _(Tests whether they generalize the window filter to rn <= 3 rather than rewriting the query.)_
- If two services tie for the highest spend in a region, which one does your query return and how would you change that? _(Probes understanding of ROW_NUMBER vs RANK vs DENSE_RANK tie semantics.)_
- How would you restrict this to the current fiscal quarter without scanning the whole table? _(Pushes toward a bill_date predicate on the base scan and partition pruning before the aggregation.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/where-the-money-pools)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.