# Cloud Cost Trend Analysis

> Cost trends across billing periods.

Canonical URL: <https://datadriven.io/problems/cloud_cost_trend_analysis>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

FinOps wants to spot bumpy cloud bills service by service. For every individual bill, list the service, the billing date, the amount, and how much that bill moved versus the service's previous bill. The first bill on record for a service has nothing to compare against. Walk the bills service by service, oldest to newest.

## Worked solution and explanation

### Why this problem exists in real interviews

This is the LAG window function in its purest form: same row count in, one extra column out, the comparison anchored per partition. The interviewer is checking that you partition by `svc_name`, order by `bill_date`, and remember that the first row per partition naturally returns NULL when there's no prior bill to subtract.

> **Trick to Solving**
>
> No GROUP BY anywhere. The output has the same row count as the input (one row per cloud_costs row); LAG just adds the bill-over-bill delta as a fourth column.

---

### Break down the requirements

#### Step 1: Project the row-level columns

Project the four output columns directly from `cloud_costs`: `svc_name`, `bill_date`, `amount`, and the LAG expression as `price_change`.

#### Step 2: Compute the bill-over-bill delta

`amount - LAG(amount) OVER (PARTITION BY svc_name ORDER BY bill_date)` gives the delta versus the previous bill of the same service. Partitioning by `svc_name` ensures the LAG never crosses service boundaries.

#### Step 3: Sort the output

`ORDER BY svc_name, bill_date` puts each service's bills together in date order. The first row per service has no prior bill so `price_change` is NULL.

---

### The solution

**LAG over (PARTITION BY svc_name ORDER BY bill_date)**

```sql
SELECT
    svc_name,
    bill_date,
    amount,
    amount - LAG(amount) OVER (PARTITION BY svc_name ORDER BY bill_date) AS price_change
FROM cloud_costs
ORDER BY svc_name, bill_date
```

> **Time and Space Complexity**
>
> **Time:** O(n log n) dominated by the partition + sort across 8M rows. The LAG itself is a single ordered scan within each partition.
> 
> **Space:** O(p) for the per-partition cursor state where p is the partition size; the planner streams the result.

> **Interviewers Watch For**
>
> Strong candidates leave the row count alone (no GROUP BY) when the prompt asks for "every billing date with the change from the prior bill". They also resist adding `WHERE price_change IS NOT NULL` because the prompt explicitly says the first bill should still appear with a NULL change.

> **Common Pitfall**
>
> Forgetting `PARTITION BY svc_name`. Without it, LAG treats the whole table as one partition and compares unrelated services to each other. Always anchor the LAG to the entity dimension.

---

## Common follow-up questions

- How would you also show the percentage change? _(Divide the delta by LAG(amount) value, guarding against division by zero with NULLIF.)_
- What if billing dates have gaps (missing months)? Should the bill after a gap show the change versus the last seen bill or NULL? _(LAG compares adjacent rows regardless of calendar gaps; a generated date series joined back may be needed.)_
- How would you flag services with 3+ consecutive bills where price_change > 0? _(Tests streak detection with CASE + SUM window patterns.)_
- If finance wanted only services where the latest bill jumped more than 50% from the prior one, where would you add that filter? _(Tests aggregation on top of the row-level result.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/cloud_cost_trend_analysis)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.