# Minimum Cost Per Provider

> The cheapest month from each provider.

Canonical URL: <https://datadriven.io/problems/minimum_cost_per_provider>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Ahead of reserved-instance negotiations, the FinOps team wants the floor cost historically paid to each cloud provider so they can anchor their bid. Show each provider alongside its lowest recorded amount.

## Worked solution and explanation

### Why this problem exists in real interviews

Extracting insights from cloud_costs.provider grouped by svc_name via grouping is the central task. It is used in mid-level screens to test whether you pick the right aggregation function and partition boundary on the first attempt.

> **Trick to Solving**
>
> Read the prompt carefully for implicit constraints. The phrase structure hints at the grain of the output: what each row represents.
> 
> 1. Identify the output grain from the prompt (one row per what?)
> 2. Work backward from the desired output columns
> 3. Build the query inside-out: innermost subquery first, then layer on filters and aggregates

---

### Break down the requirements

#### Step 1: Aggregate with MIN

Group by the output grain and apply `MIN()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 2: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**MIN aggregate per provider**

```sql
SELECT provider, MIN(amount) AS min_amount
FROM cloud_costs
GROUP BY provider
ORDER BY min_amount ASC
```

> **Cost Analysis**
>
> The query scans 12M rows from `cloud_costs`.

> **Interviewers Watch For**
>
> Interviewers expect you to articulate why you chose a specific join type and what happens to unmatched rows.

> **Common Pitfall**
>
> Forgetting that a JOIN can multiply rows when the relationship is one-to-many. Always check whether the join key is unique on at least one side.

---

## Common follow-up questions

- If cloud_costs.cost_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in cost_id.)_
- How would you verify that your aggregation on cloud_costs.cost_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in cloud_costs.cost_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like cost_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/minimum_cost_per_provider)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.