# Cloud Cost Stats by Provider

> Three providers. Three very different bills.

Canonical URL: <https://datadriven.io/problems/cloud_cost_stats_by_provider>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The FinOps team needs a unified view of costs across both actual cloud spending and internal allocations. Treating both sources as a single dataset, show each provider with its minimum, maximum, and average cost amount.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests UNION ALL to combine two tables with different schemas into a unified dataset, followed by basic aggregation. The challenge is recognizing which columns to align and which to ignore.

---

### Break down the requirements

#### Step 1: Combine the two tables

`UNION ALL` on `cloud_costs` and `cost_allocs`, selecting `provider` (or equivalent) and `amount` from each. The `cost_allocs` table lacks a `provider` column, so you need to determine the mapping.

#### Step 2: Aggregate per provider

`GROUP BY provider` with `MIN(amount)`, `MAX(amount)`, and `AVG(amount)`.

---

### The solution

**Union with provider-level stats**

```sql
WITH combined AS (
    SELECT provider, amount FROM cloud_costs
    UNION ALL
    SELECT category AS provider, amount FROM cost_allocs
)
SELECT
    provider,
    MIN(amount) AS min_amount,
    MAX(amount) AS max_amount,
    AVG(amount) AS avg_amount
FROM combined
GROUP BY provider
```

> **Cost Analysis**
>
> Combined scan of 10M + 15M = 25M rows. The CTE materializes the union, then GROUP BY reduces to the number of distinct providers. Single-pass aggregation after the union.

> **Interviewers Watch For**
>
> How you map `cost_allocs` to a "provider" dimension. The table has `category` which serves as the closest analog. Strong candidates explicitly note the mapping assumption.

> **Common Pitfall**
>
> Using UNION instead of UNION ALL would deduplicate rows with identical (provider, amount) values, causing MIN/MAX/AVG to be computed on fewer data points.

---

## Common follow-up questions

- What if the two tables used different currencies? _(Tests whether you would normalize amounts before aggregating.)_
- How would you also track which source each row came from? _(Add a literal column: SELECT 'cloud_costs' AS source, ... UNION ALL SELECT 'cost_allocs', ...)_
- What if cost_allocs had no meaningful provider equivalent? _(Tests whether to use a placeholder value or exclude the table entirely.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/cloud_cost_stats_by_provider)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.