# The Cloud Bill

> Every provider sent an invoice. Every month tells a different story.

Canonical URL: <https://datadriven.io/problems/monthly_spend_pivot_by_provider>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Show monthly cloud spend pivoted by provider. Each month should be a row, with separate columns showing total amount for AWS, GCP, and Azure. Only include completed billing entries. Return the month and each provider's total.

## Worked solution and explanation

### Why this problem exists in real interviews

This focuses on pivot and conditional aggregation within cloud_costs, specifically around the provider column. Interviewers present it in mid-level screens because the edge cases around NULL values and boundary conditions reveal depth of understanding.

> **Trick to Solving**
>
> SQL lacks a native PIVOT operator in most dialects. The trick is conditional aggregation with `CASE WHEN` inside aggregate functions.
> 
> 1. Identify the column whose values become output columns
> 2. Write `SUM(CASE WHEN col = 'val' THEN metric END)` for each pivot value
> 3. Group by the row identifier

---

### Break down the requirements

#### Step 1: Aggregate with SUM

Group by the output grain and apply `SUM()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 2: Pivot with CASE WHEN

Each `CASE WHEN` expression inside an aggregate function transforms a row value into a column. This is the standard cross-tabulation pattern in SQL.

#### Step 3: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**Provider-level CASE WHEN pivot**

```sql
SELECT STRFTIME('%Y-%m', bill_date) AS month,
    SUM(CASE WHEN provider = 'AWS' THEN amount ELSE 0 END) AS aws_total,
    SUM(CASE WHEN provider = 'GCP' THEN amount ELSE 0 END) AS gcp_total,
    SUM(CASE WHEN provider = 'Azure' THEN amount ELSE 0 END) AS azure_total
FROM cloud_costs
GROUP BY STRFTIME('%Y-%m', bill_date)
ORDER BY month
```

> **Cost Analysis**
>
> The query scans 15M rows from `cloud_costs`. The aggregation reduces the row count before any downstream processing, which is the key performance lever. CTEs in most engines are optimization fences. For production workloads, consider inlining or materializing the intermediate results.

> **Interviewers Watch For**
>
> Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax. Breaking complex logic into named CTEs shows the interviewer you prioritize readability and debuggability.

> **Common Pitfall**
>
> Comparing dates stored as TEXT without casting can produce lexicographic instead of chronological ordering. Always confirm the column type.

---

## Common follow-up questions

- If cloud_costs.cost_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in cost_id.)_
- What happens to your CASE expressions if a new category value appears in provider, svc_name? _(Tests whether the candidate recognizes that hard-coded CASE values miss future categories.)_
- With millions of distinct values in cloud_costs.cost_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like cost_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/monthly_spend_pivot_by_provider)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.