# The Regional Cost Reconciliation

> Two cost tables, one region. Reconcile the running balance.

Canonical URL: <https://datadriven.io/problems/the_regional_cost_reconciliation>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

The FinOps team wants to reconcile actual cloud charges against budget allocations on a per-region timeline. Stack both sources together, treating charges as negative and allocations as positive, and compute a running balance per region. Return region, date, amount, and running balance, sequenced by region and date.

## Worked solution and explanation

### Why this problem exists in real interviews

FinOps teams reconcile cloud bills against budget allocations to know where each region actually stands. Interviewers use this event-sourced ledger pattern because it forces you to UNION two heterogeneous tables into one normalized event stream, apply a sign convention so outflows reduce the balance, and then compute a running SUM partitioned per region. Each piece is small but the order matters: if you reverse the sign or forget the partition, the answer is silently wrong.

---

### Break down the requirements

#### Step 1: UNION ALL the two sources into one event stream

Pull `region`, a signed amount, and an event date from each table. From `cloud_costs` select `-CAST(amount AS DOUBLE)` (amount is stored as TEXT, and these are outflows so the sign flips) with `bill_date` as the event date. From `cost_allocs` select `CAST(amount AS DOUBLE)` (already positive budget inflows) with `period` as the event date. `UNION ALL` is correct here because we never want to deduplicate identical rows from different sources.

#### Step 2: Run a per-region cumulative SUM

`SUM(amt) OVER (PARTITION BY region ORDER BY event_date ROWS UNBOUNDED PRECEDING)` carries the running balance forward inside each region. The `PARTITION BY region` resets the balance per region, so a deficit in `us-east-1` does not leak into `eu-west-2`. The explicit frame avoids engine-specific defaults that can include peers on the same date.

#### Step 3: Project the four required columns and sort

Final `SELECT region, event_date, amt, running_balance`, ordered by `region` then `event_date` so each region's timeline reads top to bottom. Same-date events are non-deterministic without an explicit tiebreak, but the prompt accepts any order within a date.

---

### The solution

**UNION ALL with sign convention, then per-region running SUM**

```sql
WITH all_events AS (SELECT region, -CAST(amount AS DOUBLE) AS amt, bill_date AS event_date FROM cloud_costs UNION ALL SELECT region, CAST(amount AS DOUBLE) AS amt, period AS event_date FROM cost_allocs) SELECT region, event_date, amt, SUM(amt) OVER (PARTITION BY region ORDER BY event_date ROWS UNBOUNDED PRECEDING) AS running_balance FROM all_events ORDER BY region, event_date
```

> **Cost Analysis**
>
> `cloud_costs` and `cost_allocs` are each 15M rows, partitioned by `bill_date` and `period` respectively across 36 monthly partitions. The UNION builds a 30M-row intermediate. With only 12 distinct `region` values, the window partitions are large (about 2.5M rows each on average) so the sort step inside each partition dominates cost. There is no aggregation collapse, the output row count equals the input row count.

> **Interviewers Watch For**
>
> Interviewers watch whether you negate `cloud_costs.amount` (it is a TEXT column storing a positive number that semantically represents an outflow) and whether you `CAST(amount AS DOUBLE)` since arithmetic on TEXT silently coerces in SQLite but not in Postgres. They also note whether you partition the running SUM by `region`, the explicit `ROWS UNBOUNDED PRECEDING` frame, and whether you use `UNION ALL` (correct) versus `UNION` (which would deduplicate a real same-day allocation that happens to match a cost row).

> **Common Pitfall**
>
> Forgetting to negate `cloud_costs.amount` makes the running balance grow monotonically forever, hiding any deficit. The other classic miss is omitting `PARTITION BY region`: the SUM then accumulates globally across regions in date order, producing a single timeline that is meaningless to a per-region budget owner.

---

## Common follow-up questions

- How would you add `svc_name` as a second partitioning key so each service inside a region has its own running balance? _(Tests whether the candidate carries `svc_name` through the UNION (it exists in both tables) and adds it to `PARTITION BY region, svc_name` without re-aggregating.)_
- What would the query return if a region appears only in `cost_allocs` and never in `cloud_costs`? _(Tests UNION ALL semantics. Only inflow events show, the running balance climbs monotonically from zero, no negative dips. The candidate should also note that both tables carry 12 distinct regions, so this case is plausible at the edges.)_
- How would you add a deterministic tiebreaker for events that share the same `event_date` so the running balance is reproducible across runs? _(Tests window frame literacy. Add a secondary key like `acct_id` or a synthetic source-priority constant to the window's `ORDER BY`, so two rows on the same date always accumulate in the same order.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_regional_cost_reconciliation)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.