# Expensive AWS Services

> Some AWS services quietly drain the budget.

Canonical URL: <https://datadriven.io/problems/expensive_aws_services>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The FinOps team flags high-cost line items for review. Surface a count of unique AWS services that have at least one cost entry of 200 or more.

## Worked solution and explanation

### What this is really asking

Two filters on `cloud_costs` (8M rows): `provider = 'aws'` and `amount >= 200`, then a distinct count over `svc_name`. The word 'unique' is the whole prompt.

---

### Break down the requirements

#### Step 1: Filter to AWS and threshold

Match `provider = 'aws'` exactly, and use `amount >= 200` (the prompt says 'at least', so the boundary is inclusive).

#### Step 2: Count distinct services

`COUNT(DISTINCT svc_name)` collapses repeats. A service flagged a thousand times still counts once.

---

### The solution

**DISTINCT COUNT OVER FILTERED ROWS**

```sql
SELECT COUNT(DISTINCT svc_name) AS service_count
FROM cloud_costs
WHERE provider = 'aws'
  AND amount >= 200
```

> **Cost Analysis**
>
> 8M rows partitioned by `bill_date` with no date predicate, so every partition gets scanned. An index on `(provider, amount)` helps; COUNT DISTINCT still hashes the qualifying `svc_name` values.

> **Interviewers Watch For**
>
> Whether you ask if `provider` is case sensitive, whether 200 is inclusive, and whether 'cost entry' means a single row or an aggregated monthly total. All three change the answer.

> **Common Pitfall**
>
> Writing `COUNT(svc_name)` instead of `COUNT(DISTINCT svc_name)`. The plain count returns line-item volume in the millions, not the handful of unique services FinOps wants.

> **The False Start**
>
> First instinct is `SELECT svc_name, COUNT(*) ... GROUP BY svc_name` and eyeballing the row count. That returns N rows, not one number, and breaks any tile expecting a scalar. Pivot to `COUNT(DISTINCT svc_name)`.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you also return the total spend per qualifying service? _(Swap to a GROUP BY `svc_name` with SUM(amount), keeping the same WHERE clause.)_
- What changes if the threshold should apply to monthly totals rather than single rows? _(Pre-aggregate by `(svc_name, DATE_TRUNC('month', bill_date))` in a CTE, then filter the aggregate before counting distinct services.)_
- How would you scope this to a single billing month without scanning all partitions? _(Add `bill_date >= DATE '2024-01-01' AND bill_date < DATE '2024-02-01'` so partition pruning kicks in.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/expensive_aws_services)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.