# Top and Bottom Cloud Spenders

> The extremes. Top and bottom.

Canonical URL: <https://datadriven.io/problems/top_and_bottom_cloud_spenders>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

For an executive review, pull the top 5 highest-spending and the top 5 lowest-spending cloud services by total amount in 2025, listed from lowest total to highest. Show each service name and its total amount.

## Worked solution and explanation

### Why this problem exists in real interviews

Finance wants top and bottom spenders in one report, sorted cheapest to most expensive. You cannot reach that shape with a single `ORDER BY` plus `LIMIT`. Two independent rankings of `cloud_costs` get glued together, and on 18M partitioned rows the date predicate has to actually prune.

---

### Break down the requirements

#### Step 1: Window the year correctly

Half-open range on `bill_date`: `>= '2025-01-01' AND < '2026-01-01'`. Wrapping `bill_date` in `YEAR()` blocks partition pruning.

#### Step 2: Aggregate per service

Group by `svc_name`, sum `amount`. `region`, `acct_id`, `provider` collapse away for an exec view.

#### Step 3: Two ranked slices, then stitch

Derive top 5 by `total_amount DESC LIMIT 5` and bottom 5 by `total_amount ASC LIMIT 5` separately. `UNION ALL`, then sort the union ascending.

---

### The solution

**TOP AND BOTTOM CLOUD SPENDERS**

```sql
SELECT svc_name, total_amount
FROM (
  SELECT svc_name, SUM(amount) AS total_amount
  FROM cloud_costs
  WHERE bill_date >= '2025-01-01'
    AND bill_date <  '2026-01-01'
  GROUP BY svc_name
  ORDER BY total_amount DESC
  LIMIT 5
) AS top_spenders
UNION ALL
SELECT svc_name, total_amount
FROM (
  SELECT svc_name, SUM(amount) AS total_amount
  FROM cloud_costs
  WHERE bill_date >= '2025-01-01'
    AND bill_date <  '2026-01-01'
  GROUP BY svc_name
  ORDER BY total_amount ASC
  LIMIT 5
) AS bottom_spenders
ORDER BY total_amount ASC
```

> **Cost Analysis**
>
> Both subqueries hit the same partition slice. The half-open `bill_date` predicate is what makes this cheap; a `YEAR(bill_date) = ...` rewrite would read every partition.

> **Interviewers Watch For**
>
> Whether you isolate the two rankings before unioning. A common wrong move: one `GROUP BY` with `... DESC LIMIT 5 UNION ALL ... LIMIT 5` where the outer order applies to the union.

> **Common Pitfall**
>
> With fewer than 10 distinct services, `UNION ALL` duplicates the overlap. Swap to `UNION` or add a tag column (`'top'` / `'bottom'`).

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you tag rows as top or bottom so a BI tool can color them? _(Probes adding a literal column inside each `UNION ALL` arm.)_
- If `cloud_costs` is partitioned monthly on `bill_date`, does this prune? _(Tests half-open ranges vs function-wrapped columns.)_
- How do you adapt this to top and bottom 5 per provider? _(Moves from `LIMIT` to `ROW_NUMBER() OVER (PARTITION BY provider)`.)_

> **Trick worth knowing**
>
> Some engines reject `ORDER BY` inside a `UNION ALL` arm without a wrapping subquery. Fix with the derived-table form above, not by dropping the inner sort.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_and_bottom_cloud_spenders)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.