# The Budget Line

> Some rows are over. Some are under. Label every one.

Canonical URL: <https://datadriven.io/problems/budget_flag>

Domain: SQL · Difficulty: easy · Seniority: L5

## Problem

The FinOps team needs to flag every service-region combination as 'over', 'under', or 'on_target' by comparing the actual cloud cost against the budgeted allocation for that same service and region. Show the service, region, actual cost, budget, and the flag.

## Worked solution and explanation

### Why this problem exists in real interviews

FinOps dashboards live on actual-versus-budget joins. The interviewer is checking whether you join `cloud_costs` to `cost_allocs` on the right composite key (`svc_name`, `region`) and classify with a clean three-branch `CASE`. The risk is silently dropping services that have no budget row, or mislabeling them as `'under'` when the budget is NULL.

---

### Break down the requirements

#### Step 1: Confirm the join grain

Both tables are keyed by (`svc_name`, `region`). Say this aloud: one actual row per service-region, one budget row per service-region. If either side is at a different grain (monthly versus annual), the comparison is wrong.

#### Step 2: Pick the join type

`INNER JOIN` drops actuals that have no budget. `LEFT JOIN` keeps them and lets you flag missing-budget explicitly. Ask the interviewer which behavior FinOps wants.

#### Step 3: Three-branch classifier

Use `CASE WHEN cc.amount > ca.amount THEN 'over' WHEN cc.amount < ca.amount THEN 'under' ELSE 'on_target' END`. Order matters only for the final ELSE.

#### Step 4: Project the audit columns

Return `svc_name`, `region`, `actual_cost`, `budget`, and the flag. FinOps needs the raw numbers next to the label to investigate.

---

### The solution

**BUDGET FLAG CLASSIFIER**

```sql
SELECT
  cc.svc_name,
  cc.region,
  cc.amount AS actual_cost,
  ca.amount AS budget,
  CASE
    WHEN cc.amount > ca.amount THEN 'over'
    WHEN cc.amount < ca.amount THEN 'under'
    ELSE 'on_target'
  END AS budget_flag
FROM cloud_costs cc
JOIN cost_allocs ca
  ON cc.svc_name = ca.svc_name
 AND cc.region = ca.region
```

> **Cost Analysis**
>
> 10M actuals joining 8M budgets on a two-column key wants a hash join, with the smaller `cost_allocs` as the build side. A composite index on (`svc_name`, `region`) on both sides lets the planner switch to merge-join if the data is presorted. The `CASE` is free; cost lives in the join.

> **Interviewers Watch For**
>
> Before writing, ask: are budgets and actuals at the same time grain? Monthly budget versus quarterly actual makes the raw amount comparison meaningless. Also ask whether an actual without a budget should be dropped or flagged. That decision drives `INNER` versus `LEFT JOIN`.

> **Common Pitfall**
>
> If you `LEFT JOIN` and the budget is NULL, `cc.amount > ca.amount` evaluates to NULL, and the `CASE` falls through to `'on_target'`. Mislabels uncovered services as compliant. Guard with `WHEN ca.amount IS NULL THEN 'no_budget'` or `COALESCE(ca.amount, 0)`.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- What if `cost_allocs` has multiple rows per (svc_name, region) for different time periods? _(Tests whether you spot the join fan-out and add a date filter or pick the latest allocation.)_
- How would you express percent over budget instead of a flag? _(Pushes division with `NULLIF(ca.amount, 0)` to avoid divide-by-zero on free-tier services.)_
- Switch to `LEFT JOIN` and surface services with no budget row. How does the `CASE` change? _(Probes NULL-aware classification and three-valued logic.)_
- Aggregate to one row per service across regions, with a single rolled-up flag. How? _(Forces a `SUM` rollup before the comparison and a discussion of grain choice.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/budget_flag)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.