# Cost Density Extremes

> Some regions pack more cost per node than others.

Canonical URL: <https://datadriven.io/problems/cost_density_extremes>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Cost density is total cost divided by the number of services in that region, rounded to the nearest integer. Show region, provider, and density for only the minimum and maximum density regions.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests a multi-step aggregation with a derived metric, followed by filtering for min/max rows. Computing cost density (total cost / service count) and then selecting extremes probes CTE fluency and top/bottom-N patterns.

---

### Break down the requirements

#### Step 1: Compute cost density per region-provider

`SUM(amount) / COUNT(DISTINCT svc_name)` gives cost per service, rounded to nearest integer.

#### Step 2: Find min and max density

Use window functions `MIN() OVER ()` and `MAX() OVER ()` or a CTE to identify the extreme values.

#### Step 3: Filter to extremes only

Keep rows where density equals the global min or max.

---

### The solution

**Derived metric with extreme value selection**

```sql
WITH density AS (
    SELECT
        region,
        provider,
        ROUND(SUM(amount) / COUNT(DISTINCT svc_name)) AS cost_density
    FROM cloud_costs
    GROUP BY region, provider
)
SELECT region, provider, cost_density
FROM density
WHERE cost_density = (SELECT MIN(cost_density) FROM density)
   OR cost_density = (SELECT MAX(cost_density) FROM density)
```

> **Cost Analysis**
>
> Scan of 10M rows for the initial aggregation. The density CTE produces a small set (regions x providers). The min/max subqueries scan that small set twice.

> **Interviewers Watch For**
>
> Whether the candidate computes the derived metric cleanly in a CTE before filtering. Trying to do everything in one query level leads to complex, error-prone SQL.

> **Common Pitfall**
>
> Using `COUNT(svc_name)` instead of `COUNT(DISTINCT svc_name)` would count total cost records, not distinct services, giving a different (and incorrect) density metric.

---

## Common follow-up questions

- What if multiple regions tie at the minimum or maximum? _(The query returns all ties, which is correct for this problem.)_
- How would you also show the rank of each region by density? _(Add RANK() OVER (ORDER BY cost_density) to the density CTE.)_
- What if you needed the median density instead of min/max? _(Tests PERCENTILE_CONT for median computation.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/cost_density_extremes)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.