# Highest and Lowest Cloud Costs

> The extremes in cloud spending.

Canonical URL: <https://datadriven.io/problems/highest_and_lowest_cloud_costs>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Find the single highest-cost and single lowest-cost entries across the entire cloud cost dataset. For each, show the cost ID, amount, service name, and a label indicating whether it is the highest or lowest.

## Worked solution and explanation

### Why this problem exists in real interviews

Cloud cost anomaly reports always need both extremes side by side: the runaway charge and the suspiciously cheap line item. Interviewers use this prompt to see whether you can label two halves of a result set, preserve ties at each extreme, and stitch them together without losing either rows or column alignment.

---

### Break down the requirements

#### Step 1: Find every row tied at MAX(amount)

Use a scalar subquery WHERE amount = (SELECT MAX(amount) FROM cloud_costs). This returns every row at the max, so a single huge invoice and any duplicates all come back.

#### Step 2: Find every row tied at MIN(amount), then label both halves

Mirror the same pattern for MIN. Each half projects a literal label, 'Highest Cost' or 'Lowest Cost', as cost_type so the consumer can distinguish them after the union.

#### Step 3: Glue with UNION ALL, not UNION

UNION ALL preserves duplicates and skips the implicit DISTINCT sort. The two halves cannot collide because the cost_type literals differ, so UNION ALL is both correct and cheaper.

---

### The solution

**Two filtered selects unioned with literal labels**

```sql
SELECT cost_id, amount, svc_name, 'Highest Cost' AS cost_type
FROM cloud_costs
WHERE amount = (SELECT MAX(amount) FROM cloud_costs)
UNION ALL
SELECT cost_id, amount, svc_name, 'Lowest Cost' AS cost_type
FROM cloud_costs
WHERE amount = (SELECT MIN(amount) FROM cloud_costs)
```

> **Cost Analysis**
>
> cloud_costs has 12,000,000 rows. Each MIN and MAX subquery is a full scan unless an index on amount exists, so expect three scans (MIN, MAX, and the outer match). With an index on amount, the extremes are O(log n) lookups and the outer match is a range seek. UNION ALL avoids the duplicate-elimination sort that UNION would force across both halves.

> **Interviewers Watch For**
>
> They want UNION ALL (not UNION), they want every tied row preserved, and they want the literal label projected as a column rather than sneaked in as a comment. Strong candidates also propose a single-pass alternative using window functions: WHERE amount = MAX(amount) OVER () OR amount = MIN(amount) OVER () and a CASE for the label.

> **Common Pitfall**
>
> Using ORDER BY amount DESC LIMIT 1 (and similarly for MIN) silently drops ties and returns one row per extreme, which is wrong as soon as two providers post the same charge. The other classic miss is UNION instead of UNION ALL, which works here but pays for an unnecessary global sort across 12M rows in the worst case.

---

## Common follow-up questions

- How would you do this in a single scan of cloud_costs? _(Forces the candidate to reach for MIN(amount) OVER () and MAX(amount) OVER () in a subquery, then filter the outer query. Cuts three scans to one at the cost of materializing a wider intermediate.)_
- What if the spec said 'highest and lowest amount per provider' instead of overall? _(Tests partitioned window functions: ROW_NUMBER or DENSE_RANK partitioned by provider ordered by amount, then filtered to rank 1 from each end.)_
- How would you exclude rows where amount is negative (refunds)? _(Adds WHERE amount > 0 to all three places, including the MIN and MAX subqueries, and the candidate should call out that forgetting one of the three breaks the answer.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/highest_and_lowest_cloud_costs)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.