# Above-Average Cloud Spend

> Some services quietly burn more than the rest.

Canonical URL: <https://datadriven.io/problems/above_average_cloud_spend>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The FinOps team flagged rising infrastructure costs ahead of budget planning. Identify every cloud service whose average cost per billing record exceeds the overall average cost across all records. Return the service names in alphabetical order.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge targets self-joining `cloud_costs` to compare rows within the same table combined with nested subqueries against `cloud_costs`. Getting the grouping wrong on `provider`, `svc_name`, `region` produces silently incorrect counts, which is exactly the trap interviewers set.

> **Trick to Solving**
>
> The phrase "above average" or "exceeds the overall" signals a **scalar subquery**. The average must be computed across all qualifying rows, then each row compared against that single value.
> 
> 1. Write the subquery to compute the benchmark first
> 2. Use it in the WHERE clause of the outer query
> 3. Verify the subquery is uncorrelated (runs once, not per row)

---

### Break down the requirements

#### Step 1: Compute the benchmark average of `amount`

A scalar subquery calculates `AVG(amount)` across all rows. This single value becomes the comparison threshold.

#### Step 2: Filter rows exceeding the benchmark

`WHERE amount > (subquery)` keeps only rows above the global average. The subquery executes once.

#### Step 3: Order by the comparison column

Sort descending by `amount` to show the most extreme outliers first.

---

### The solution

**Self-join for above-average cloud spend**

```sql
SELECT *,
    (SELECT AVG(amount) FROM cloud_costs) AS overall_avg
FROM cloud_costs
WHERE amount > (
    SELECT AVG(amount)
    FROM cloud_costs
)
ORDER BY amount DESC
```

> **Cost Analysis**
>
> The main table has 15M rows (4 GB). Partitioned on `bill_date`, so queries filtering on that column skip most partitions. The GROUP BY reduces the row count early, keeping downstream operations cheap.

> **Interviewers Watch For**
>
> Strong candidates state the correct `GROUP BY` grain before writing any SQL, showing they think about the output shape first. Candidates who explain correlated vs uncorrelated subqueries earn extra credibility.

> **Common Pitfall**
>
> Selecting a non-aggregated column without including it in `GROUP BY` is the most common error. Some engines reject it; others silently return arbitrary values.

---

## Common follow-up questions

- What happens to your results if `provider` in `cloud_costs` contains trailing whitespace or mixed casing? _(Tests awareness of text normalization issues that silently fragment GROUP BY results.)_
- Your self-join on `cloud_costs` could produce a quadratic number of row pairs. What index would you add to keep this fast? _(Tests awareness of self-join explosion and the importance of indexing the join key.)_
- `cost_id` in `cloud_costs` has ~15M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- Could you express this same logic as a single query without CTEs or subqueries? What readability trade-off does that introduce? _(Tests whether the candidate can flatten nested logic and understands when decomposition aids maintainability.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/above_average_cloud_spend)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.