# Bargains and Budget-Busters

> Every region has both. Find them.

Canonical URL: <https://datadriven.io/problems/bargains_and_budget_busters>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

FinOps is auditing cloud spend region by region. Two systems record line items: the provider's cost ledger and the team's internal allocations. Treat them as one combined stream of line items. For each region, name the service behind the single biggest line item and the service behind the single smallest. If two line items tie, take the one whose service name comes first alphabetically. Return one row per region with the region, its biggest spender, and its cheapest line item.

## Worked solution and explanation

### Why this problem exists in real interviews

Two tables, one combined stream, two extreme picks per group. The interviewer is checking three things: that you UNION ALL the heterogeneous sources rather than treating one as canonical, that you encode the alphabetical tie-break the prompt specifies into the window's ORDER BY rather than letting the engine pick, and that you reach for FIRST_VALUE within a region partition instead of aggregating away the underlying line items.

> **Trick to Solving**
>
> **No GROUP BY, no SUM.** The prompt asks for the single most expensive and single cheapest line item per region, with ties broken alphabetically by service name.
> 
> 1. UNION ALL `(region, svc_name, amount)` from both tables
> 2. `FIRST_VALUE(svc_name) OVER (PARTITION BY region ORDER BY amount DESC, svc_name ASC)` for the most expensive
> 3. The same `FIRST_VALUE` with `ORDER BY amount ASC, svc_name ASC` for the cheapest
> 4. SELECT DISTINCT to collapse to one row per region

---

### Break down the requirements

#### Step 1: Combine the two sources

UNION ALL `cloud_costs` and `cost_allocs`, projecting just `region`, `svc_name`, `amount` from each. The two tables record line items from different sources (provider-billed vs. internally allocated), so a row in one is not a duplicate of the other; UNION ALL preserves every line item, while UNION would deduplicate identical (region, svc, amount) tuples and silently lose data.

#### Step 2: Compute the two extremes per region with FIRST_VALUE

Inside a CTE, attach two window-function columns: `FIRST_VALUE(svc_name) OVER (PARTITION BY region ORDER BY amount DESC, svc_name ASC)` and `FIRST_VALUE(svc_name) OVER (PARTITION BY region ORDER BY amount ASC, svc_name ASC)`. The secondary `svc_name ASC` makes the result deterministic when two line items in a region share the extreme amount; without it, FIRST_VALUE returns whichever row the engine sees first. Each row in the CTE now carries its region's most_expensive and cheapest service names.

#### Step 3: Collapse to one row per region

`SELECT DISTINCT region, most_expensive, cheapest` collapses the row count to one per region, since the two window-derived columns are constant within a region partition.

---

### The solution

**UNION ALL, FIRST_VALUE per region, then SELECT DISTINCT**

```sql
WITH combined AS (
    SELECT region, svc_name, amount FROM cloud_costs
    UNION ALL
    SELECT region, svc_name, amount FROM cost_allocs
),
ranked AS (
    SELECT
        region,
        svc_name,
        amount,
        FIRST_VALUE(svc_name) OVER (PARTITION BY region ORDER BY amount DESC, svc_name ASC) AS most_expensive,
        FIRST_VALUE(svc_name) OVER (PARTITION BY region ORDER BY amount ASC,  svc_name ASC) AS cheapest
    FROM combined
)
SELECT DISTINCT region, most_expensive, cheapest
FROM ranked
```

> **Time and Space Complexity**
>
> **Time:** O(n log n) where n = 12M + 25M = 37M combined rows. The dominant step is the partition sort by amount within each region (15 partitions). The final DISTINCT collapses to 15 rows.
> 
> **Space:** O(n) for the CTE materialization; planners that pipeline the window can reduce that.

> **Interviewers Watch For**
>
> Strong candidates resist the urge to SUM amounts per (region, svc_name). The prompt asks for the line item extremes, not the cumulative service spend. They also use UNION ALL, not UNION, to preserve every line item, and they encode the alphabetical tie-break as a secondary key in the window's ORDER BY rather than relying on the engine's row arrival order.

> **Common Pitfall**
>
> Aggregating amounts before ranking. `SUM(amount)` per (region, svc_name) finds the service with the largest cumulative spend, which is a different question. The grader compares against per-line-item extremes.

---

## Common follow-up questions

- How would you sanity-check that `cloud_costs` and `cost_allocs` aren't both recording the same underlying spend before treating them as one combined stream? _(Tests whether the candidate spots that the two sources cover overlapping spend. If allocations are a re-cut of the same dollars in the cost ledger, UNION ALL double-counts. Strong answer: confirm the source semantics with the data owner before unioning.)_
- How would the query change if the team wanted the most and least expensive services by total spend, not by single line item? _(Tests reframing to per-service totals.)_
- How would you rewrite this with ROW_NUMBER and a WHERE filter instead of FIRST_VALUE plus DISTINCT? _(Tests reusing the same CTE shape with ROW_NUMBER instead.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/bargains_and_budget_busters)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.