# Top 2 Cloud Services by Cost

> Two services eating most of the budget.

Canonical URL: <https://datadriven.io/problems/top_2_cloud_services_by_cost>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

For 2026 cloud spending, find the top 2 services by total cloud cost, but only for services that also have cost allocations on file. Exclude any records where the billing date is null (cancelled charges). Show the service name and total cloud cost amount.

## Worked solution and explanation

### Why this problem exists in real interviews

The interviewer wants to see if you ask about ties before reaching for `LIMIT 2`. 'Top 2 services' is ambiguous: if three services tie for second place, LIMIT silently drops one and your FinOps report is wrong. They also watch whether you notice `strftime('%Y', bill_date)` kills partition pruning on a 15M-row table partitioned by `bill_date`.

---

### Break down the requirements

#### Step 1: Restrict to services with allocations on file

`INNER JOIN cost_allocs ca ON cc.svc_name = ca.svc_name`. The inner join is the filter; no separate EXISTS needed. Confirm with the interviewer that allocations are deduped by `svc_name`, otherwise you fan out `cc.amount`.

#### Step 2: Drop cancelled charges

`cc.bill_date IS NOT NULL`. The prompt calls out nulls explicitly. Put it in WHERE alongside the year filter, not in the ON clause, since both sides are deterministic post-join.

#### Step 3: Aggregate to service grain

`SUM(cc.amount) AS total_amount GROUP BY cc.svc_name`. Sum only `cc.amount`, never `ca.amount`; the allocation rows are a join filter, not a metric source.

#### Step 4: Pick the top 2

`ORDER BY total_amount DESC LIMIT 2`. State out loud whether ties at rank 2 should all surface; if yes, switch to `DENSE_RANK() OVER (ORDER BY total_amount DESC) <= 2`.

---

### The solution

**TOP 2 SERVICES BY COST**

```sql
SELECT cc.svc_name,
       SUM(cc.amount) AS total_amount
FROM cloud_costs cc
INNER JOIN cost_allocs ca
  ON cc.svc_name = ca.svc_name
WHERE strftime('%Y', cc.bill_date) = '2026'
  AND cc.bill_date IS NOT NULL
GROUP BY cc.svc_name
ORDER BY total_amount DESC
LIMIT 2;
```

> **Cost Analysis**
>
> `cloud_costs` is 15M rows, `cost_allocs` is 18M, joined on `svc_name`. `strftime('%Y', cc.bill_date)` blocks partition pruning on `cc.bill_date`, forcing a full scan before the hash join. Replace with `cc.bill_date >= DATE('2026-01-01') AND cc.bill_date < DATE('2026+1-01-01')` and the planner prunes to one year of partitions.

> **Interviewers Watch For**
>
> Before writing, ask: 'If two services tie at rank 2, do you want both, or is one arbitrary winner fine?' That single question separates a useful FinOps report from a flaky one. Also flag that `cost_allocs` may have multiple rows per `svc_name`, fanning out `cc.amount` if you don't dedupe.

> **Common Pitfall**
>
> Joining on `svc_name` without checking `cost_allocs` cardinality. If `cost_allocs` has 3 rows per service, `SUM(cc.amount)` triples. Fix: `INNER JOIN (SELECT DISTINCT svc_name FROM cost_allocs) ca` or aggregate first. The fan-out bug looks right until you compare against finance's numbers.

**Wrong (ties dropped)**

`ORDER BY total_amount DESC LIMIT 2` silently drops a tied service at rank 2.

**Right (ties surface)**

`SELECT ... FROM (SELECT svc_name, SUM(amount) AS t, DENSE_RANK() OVER (ORDER BY SUM(amount) DESC) AS r ...) WHERE r <= 2`.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- Rewrite this so ties at rank 2 all appear in the output. _(Forces the DENSE_RANK refactor and tests whether you understand LIMIT's tie behavior.)_
- How would you confirm `cost_allocs` doesn't fan out `cc.amount`? _(Probes join-cardinality discipline: run a `COUNT(*) / COUNT(DISTINCT svc_name)` sanity check on `cost_allocs` first.)_
- Rewrite the date filter so the partition prune fires on `cc.bill_date`. _(Confirms you know function-on-column defeats partition elimination.)_
- What if allocations are scoped by `period` and you only want services allocated in the same year? _(Adds a second join predicate and tests whether you keep allocation scope aligned with the cost window.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_2_cloud_services_by_cost)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.