# Cloud Cost Breakdown by Provider

> Cloud costs, provider by provider.

Canonical URL: <https://datadriven.io/problems/cloud_cost_breakdown_by_provider>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

The FinOps team wants a multi-year cost breakdown from 2022 through 2025. For each cost category and year in the allocation records, show the total spend, the spend specifically in the us-east-1 region, and the number of line items, from the highest total spend down.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests date extraction, conditional aggregation, and multi-column GROUP BY. The region-specific spend column probes whether you can embed a CASE WHEN inside an aggregate to produce a pivot-like output.

---

### Break down the requirements

#### Step 1: Extract year from period

Use `EXTRACT(YEAR FROM period)` to get the year component. Filter to the 4-year range.

#### Step 2: Aggregate per category and year

`GROUP BY category, year` with `SUM(amount)` for total spend, `COUNT(*)` for line items.

#### Step 3: Compute region-specific spend

`SUM(CASE WHEN region = 'us-east-1' THEN amount ELSE 0 END)` extracts the us-east-1 slice.

#### Step 4: Sort by total spend descending

`ORDER BY total_spend DESC` as specified.

---

### The solution

**Year extraction with conditional regional aggregation**

```sql
SELECT
    category,
    EXTRACT(YEAR FROM period::DATE)::INT AS yr,
    SUM(amount) AS total_spend,
    SUM(CASE WHEN region = 'us-east-1' THEN amount ELSE 0 END) AS us_east_1_spend,
    COUNT(*) AS line_items
FROM cost_allocs
WHERE EXTRACT(YEAR FROM period::DATE) BETWEEN 2021 AND 2024
GROUP BY category, yr
ORDER BY total_spend DESC
```

> **Cost Analysis**
>
> Scan of 40M rows with year extraction and conditional sum. The GROUP BY reduces to (categories x 4 years). Expression-based filtering prevents index usage on `period`; a range predicate on the raw column would be more index-friendly.

> **Interviewers Watch For**
>
> Whether the candidate uses conditional aggregation for the region-specific column vs a self-join or subquery. The single-pass approach is more efficient and shows fluency with CASE WHEN inside aggregates.

> **Common Pitfall**
>
> Using `YEAR(period)` is MySQL syntax. In PostgreSQL, use `EXTRACT(YEAR FROM period)`. Database dialect awareness matters in interviews.

---

## Common follow-up questions

- How would you add year-over-year growth as a column? _(Tests LAG window function partitioned by category ordered by year.)_
- What if you needed all regions as separate columns? _(Tests CROSSTAB or extensive conditional aggregation for dynamic pivoting.)_
- How would you handle the period column if it stores month-end dates? _(Tests date truncation and whether year extraction still works correctly.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/cloud_cost_breakdown_by_provider)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.