# Combined Cloud Spend by Region and Service

> Region by region. Service by service. Where does the money go?

Canonical URL: <https://datadriven.io/problems/combined_cloud_spend_by_region_and_service>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The FinOps team wants a unified view of cloud spending across both the actual cost records and the internal allocation records. Combine both sources, remove duplicate line items, exclude rows missing an amount or region, and show the total spend for each region-service combination, sorted from highest to lowest.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests UNION ALL with deduplication, NULL filtering, and GROUP BY. The combination of merging heterogeneous tables, cleaning data, and aggregating is a realistic data engineering workflow.

---

### Break down the requirements

#### Step 1: Union the two tables

Select `region`, `svc_name`, `amount` from both `cloud_costs` and `cost_allocs`, using UNION to remove duplicates.

#### Step 2: Filter out NULLs

Exclude rows where `amount IS NULL OR region IS NULL`.

#### Step 3: Aggregate and sort

`GROUP BY region, svc_name` with `SUM(amount)`, ordered by total descending.

---

### The solution

**Deduplicated union with null filtering**

```sql
WITH combined AS (
    SELECT region, svc_name, amount FROM cloud_costs
    WHERE amount IS NOT NULL AND region IS NOT NULL
    UNION
    SELECT region, svc_name, amount FROM cost_allocs
    WHERE amount IS NOT NULL AND region IS NOT NULL
)
SELECT region, svc_name, SUM(amount) AS total_spend
FROM combined
GROUP BY region, svc_name
ORDER BY total_spend DESC
```

> **Cost Analysis**
>
> UNION (not UNION ALL) deduplicates 12M + 18M = 30M rows, which requires sorting or hashing. The GROUP BY then reduces to (regions x services). The deduplication step is the most expensive operation.

> **Interviewers Watch For**
>
> Whether you use UNION vs UNION ALL. The prompt says "remove duplicate line items," so UNION is correct here. This is the opposite of the typical advice to always use UNION ALL.

> **Common Pitfall**
>
> Filtering NULLs after the UNION instead of before means duplicate rows with NULLs might survive deduplication. Filter in each branch of the UNION for cleaner results.

---

## Common follow-up questions

- What defines a 'duplicate line item' across the two tables? _(Tests whether deduplication should be on all columns or a subset. UNION deduplicates on all selected columns.)_
- How would performance change with UNION ALL and a separate DISTINCT step? _(Tests understanding of query plan differences: UNION sorts during merge; separate DISTINCT sorts after.)_
- What if the two tables had different amount scales (e.g., dollars vs cents)? _(Tests data normalization awareness before combining sources.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/combined_cloud_spend_by_region_and_service)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.