# Multi-Provider Cost Lookup

> AWS, GCP, Azure. Side by side.

Canonical URL: <https://datadriven.io/problems/multi_provider_cost_lookup>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The FinOps team is reconciling invoices from the three major cloud providers. Pull every cost amount associated with AWS, GCP, or Azure.

## Worked solution and explanation

### Why this problem exists in real interviews

Extracting insights from cloud_costs.provider grouped by svc_name via query construction is the central task. It is used as a fundamentals check to test whether you pick the right aggregation function and partition boundary on the first attempt.

---

### Break down the requirements

#### Step 1: Filter to the three providers

`WHERE provider IN ('AWS', 'GCP', 'Azure')` selects rows matching any of the three major cloud providers.

#### Step 2: Select the cost amount

Return the `amount` column. The prompt asks specifically for cost amounts associated with those providers.

---

### The solution

**IN filter on provider column**

```sql
SELECT amount
FROM cloud_costs
WHERE provider IN ('AWS', 'GCP', 'Azure')
```

> **Cost Analysis**
>
> The query scans `cloud_costs` (8,000,000 rows). A covering index on the filter and group columns would reduce I/O. At this scale, the full scan is acceptable but becomes costly if the table grows 10x.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If cloud_costs.cost_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in cost_id.)_
- How would you verify that your aggregation on cloud_costs.cost_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in cloud_costs.cost_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like cost_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/multi_provider_cost_lookup)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.