# Monthly Cloud Cost Forecast Error

> The forecast was off. By how much?

Canonical URL: <https://datadriven.io/problems/monthly_cloud_cost_forecast_error>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

We want a naive forecast model for monthly cloud costs. Sum cloud cost amounts at a monthly level. Use the previous month's total as the forecast for the current month, then calculate the absolute percentage error versus actual. Show year-month, actual cost, forecasted cost, and percentage error.

## Worked solution and explanation

### Why this problem exists in real interviews

Finance and FinOps teams use a naive forecast (last month equals next month) as the baseline against which more sophisticated models are scored. Interviewers use this prompt to see whether you can roll up monthly costs, use LAG as the forecast, and compute a guarded percent error.

---

### Break down the requirements

#### Step 1: Bucket bill_date by year-month and SUM positive amounts

strftime('%Y-%m', bill_date) gives the YYYY-MM key. SUM(amount) WHERE amount IS NOT NULL AND amount > 0 keeps refunds and zero rows out of the actual cost. Filter in WHERE because amount is a row-level column, not an aggregate.

#### Step 2: LAG to get the prior month as the naive forecast

LAG(total_amount) OVER (ORDER BY ym) is prev_amount, the forecast. Then WHERE prev_amount IS NOT NULL drops the very first month, which has no prior to compare against.

#### Step 3: Guard pct_error against divide-by-zero

Wrap the percent in CASE WHEN total_amount > 0 THEN ABS((prev_amount - total_amount) / total_amount) * 100 END. If actual cost is zero, pct_error is NULL rather than a SQL error or infinity. ABS makes the error symmetric for over- and under-forecast.

---

### The solution

**Monthly aggregate, LAG forecast, guarded percent error**

```sql
WITH monthly AS (
  SELECT strftime('%Y-%m', bill_date) AS ym, SUM(amount) AS total_amount
  FROM cloud_costs WHERE amount IS NOT NULL AND amount > 0
  GROUP BY strftime('%Y-%m', bill_date)
),
ratios AS (
  SELECT ym, total_amount, LAG(total_amount) OVER (ORDER BY ym) AS prev_amount FROM monthly
)
SELECT ym, total_amount AS actual_cost, prev_amount AS forecasted_cost,
  CASE WHEN total_amount > 0 THEN ABS((prev_amount - total_amount) / total_amount) * 100 END AS pct_error
FROM ratios
WHERE prev_amount IS NOT NULL
ORDER BY ym
```

> **Cost Analysis**
>
> cloud_costs has 20M rows; the aggregate collapses to roughly 24 to 36 monthly buckets so LAG and the final SELECT are trivial. The full scan with the WHERE filter is the dominant cost; if bill_date is indexed and amount is denormalized cheaply, the planner can skip the scan.

> **Interviewers Watch For**
>
> Did you SUM only positive amounts (refunds would distort the actual), use LAG as a naive forecast (not a window AVG), and CASE-guard the divide-by-zero? Candidates often divide by prev_amount instead of total_amount; the prompt explicitly says total_amount is the denominator.

> **Common Pitfall**
>
> Dividing by prev_amount looks symmetric but reverses the meaning of the metric: prompt says ABS((prev - actual) / actual), so the denominator is actual cost, not the forecast. Read the formula carefully and follow it literally.

---

## Common follow-up questions

- How would you switch the forecast from naive (last month) to a 3-month moving average? _(Replace LAG with AVG(total_amount) OVER (ORDER BY ym ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING). The percent error formula stays the same; only the forecast definition changes.)_
- How do you handle a missing month (no bills at all)? _(It silently disappears from the aggregate, so LAG compares non-adjacent months. Generate a calendar CTE with recursive WITH and LEFT JOIN to fill zero or NULL gaps before LAG.)_
- Should you exclude the current (incomplete) month? _(Yes: an incomplete month under-reports actual and inflates pct_error. Add WHERE bill_date < strftime('%Y-%m-01', 'now') to scope to fully closed months.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/monthly_cloud_cost_forecast_error)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.