# Allocations in Top Spending Region

> The biggest spenders live in one region.

Canonical URL: <https://datadriven.io/problems/allocations_in_top_spending_region>

Domain: SQL · Difficulty: hard · Seniority: L3

## Problem

The FinOps team is investigating cost allocation for the region with the highest total cloud spend. Identify which region tops the cloud cost table by total amount, then pull all cost allocation records for teams operating in that region. Return all available fields.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use the `cloud_costs`, `cost_allocs` tables here to probe grouped aggregation combined with left join to retain unmatched rows. The columns `provider`, `svc_name`, `region` force candidates to reason about the correct grain before writing any aggregation.

---

### Break down the requirements

#### Step 1: Join `cloud_costs` with `cost_allocs`

Use `LEFT JOIN` on `acct_id` to link fact rows to the dimension table. LEFT JOIN preserves rows with no matches.

#### Step 2: Group by `team_name`

`GROUP BY b.team_name` produces one row per distinct value of the grouping column.

#### Step 3: Aggregate with SUM

`SUM(amount)` computes the metric at the grouped grain.

#### Step 4: Order by the metric

Sort descending to surface the top values first.

---

### The solution

**Left-join for allocations top spending region**

```sql
SELECT
    b.team_name,
    SUM(amount) AS sum_amount,
    MAX(a.amount) AS max_val
FROM cloud_costs a
LEFT JOIN cost_allocs b ON a.acct_id = b.acct_id
GROUP BY b.team_name
ORDER BY sum_amount DESC
```

> **Cost Analysis**
>
> The main table has 15M rows (4 GB). Partitioned on `bill_date`, so queries filtering on that column skip most partitions. The GROUP BY reduces the row count early, keeping downstream operations cheap.

> **Interviewers Watch For**
>
> Strong candidates state the correct `GROUP BY` grain before writing any SQL, showing they think about the output shape first. Join type selection (INNER vs LEFT) reveals whether the candidate read the requirements carefully.

> **Common Pitfall**
>
> Selecting a non-aggregated column without including it in `GROUP BY` is the most common error. Some engines reject it; others silently return arbitrary values.

---

## Common follow-up questions

- What happens to your results if `provider` in `cloud_costs` contains trailing whitespace or mixed casing? _(Tests awareness of text normalization issues that silently fragment GROUP BY results.)_
- Your LEFT JOIN keeps all rows from `cloud_costs` even when `cost_allocs` has no match. How do NULLs from the right side affect your aggregation? _(Tests understanding of how outer join NULLs propagate into SUM, COUNT, and AVG.)_
- `cost_id` in `cloud_costs` has ~15M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- If the business definition of `svc_name` changed mid-quarter (e.g., a status value was renamed), how would you handle historical consistency? _(Tests awareness of slowly changing dimensions and backward-compatible query design.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/allocations_in_top_spending_region)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.