# Provider Cost Change H1

> Cost swings in the first half of the year.

Canonical URL: <https://datadriven.io/problems/provider_cost_change_h1>

Domain: SQL · Difficulty: easy · Seniority: L4

## Problem

FinOps wants to know how cloud spend drifted from the start of the year to mid-year. For each provider, compare the typical January bill against the typical July bill and report how much the average moved. List the providers in order.

## Worked solution and explanation

### Why this problem exists in real interviews

Two time windows, two averages, one delta per provider. The interviewer is checking that you can isolate rows by month using a portable date function, compute each window's average, and join them on `provider` to subtract. Two CTEs read more cleanly than one big conditional aggregate, and the explicit join makes the symmetry obvious.

> **Trick to Solving**
>
> Two CTEs that average per provider, joined on `provider`.
> 
> 1. CTE `jan`: rows where the bill_date falls in January, averaged per provider
> 2. CTE `jul`: rows where the bill_date falls in July, averaged per provider
> 3. Outer: join on `provider`, subtract jan from jul

---

### Break down the requirements

#### Step 1: January average per provider

Build the `jan` CTE: filter to rows where the month component of `bill_date` equals 01 and group by `provider` taking `AVG(amount)`. Use a portable extractor like `strftime('%m', bill_date) = '01'` so the filter works the same in any month, not just specific dates.

#### Step 2: July average per provider

Build the `jul` CTE the same way, switching the month filter to `'07'`. Same shape: one row per provider with the July average.

#### Step 3: Join and subtract

Inner-join the two CTEs on `provider` and project `jul.avg_amount - jan.avg_amount AS amount_change`. An inner join naturally drops providers that appear in only one window. Order by `provider` for a stable result.

---

### The solution

**Two month-filtered CTEs joined per provider**

```sql
WITH jan AS (
    SELECT provider, AVG(amount) AS avg_amount
    FROM cloud_costs
    WHERE strftime('%m', bill_date) = '01'
    GROUP BY provider
),
jul AS (
    SELECT provider, AVG(amount) AS avg_amount
    FROM cloud_costs
    WHERE strftime('%m', bill_date) = '07'
    GROUP BY provider
)
SELECT
    jan.provider,
    jul.avg_amount - jan.avg_amount AS amount_change
FROM jan
JOIN jul ON jan.provider = jul.provider
ORDER BY jan.provider
```

> **Time and Space Complexity**
>
> **Time:** Two scans of `cloud_costs` (8M rows each). The month filter is selective (about 1/12 of the data), and the GROUP BY collapses each side to 3 rows (one per provider). The final join is a 3-row hash join.
> 
> **Space:** O(p) where p is the number of providers (3 here).

> **Interviewers Watch For**
>
> Strong candidates filter on the **month** component, not on a specific calendar day. They also use the engine's date-extract function (`strftime`, `EXTRACT`, `DATE_PART`) instead of LIKE patterns that depend on string formatting.

> **Common Pitfall**
>
> Filtering only on bills with bill_date exactly equal to January 1 or July 1. With monthly bills, that picks up at most one row per provider per year and the average is unstable; daily bills miss the rest of the month entirely. The prompt asks for the whole-month average.

---

## Common follow-up questions

- How would you express this as a single SELECT using `AVG(CASE WHEN ... END)` and no CTEs? What readability tradeoff appears? _(Tests reframing the same shape as a single GROUP BY with conditional aggregation.)_
- If a provider only billed in January but not July, the inner join drops them. How would you keep them with a NULL change? _(Tests handling missing windows with LEFT JOIN + COALESCE.)_
- How would you generalize the query to accept any two month numbers (say, Q1 vs Q3) without duplicating the CTE? _(Tests parameterizing the comparison across arbitrary months without rewriting the query.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/provider_cost_change_h1)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.