# Top Services Per Provider

> Within each cloud, two services rise above the rest.

Canonical URL: <https://datadriven.io/problems/top_services_per_provider>

Domain: SQL · Difficulty: medium · Seniority: L6

## Problem

The FinOps team needs to see the top 2 highest-spending services within each cloud provider. For each result row, include the provider, service name, and total spend. If two services tie on spend within a provider, include both.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a classic partitioned top-N problem. Interviewers test whether you can combine `GROUP BY` with a partitioned window function and know which ranking function handles ties correctly.

> **Trick to Solving**
>
> "Top 2 within each provider" plus "include both if tied" signals a partitioned `DENSE_RANK`. The partition column is `provider`, and the ordering column is the aggregated spend.
> 
> 1. Aggregate to one row per (provider, svc_name)
> 2. Apply `DENSE_RANK() OVER (PARTITION BY provider ORDER BY total_spend DESC)`
> 3. Filter to rank `<= 2` in an outer query

---

### Break down the requirements

#### Step 1: Aggregate spend per provider and service

`GROUP BY provider, svc_name` with `SUM(amount)` collapses 15M rows to roughly 1,050 rows (3 providers x 350 services).

#### Step 2: Rank within each provider

`DENSE_RANK() OVER (PARTITION BY provider ORDER BY total_spend DESC)` ranks services independently within each provider, preserving ties.

#### Step 3: Filter to top 2 ranks

Wrap in a subquery, filter `WHERE rnk <= 2`, and order by `provider, total_spend DESC` for clean output.

---

### The solution

**Partitioned top-N with tie inclusion**

```sql
SELECT provider, svc_name, total_spend
FROM (
    SELECT
        provider,
        svc_name,
        SUM(amount) AS total_spend,
        DENSE_RANK() OVER (
            PARTITION BY provider
            ORDER BY SUM(amount) DESC
        ) AS rnk
    FROM cloud_costs
    GROUP BY provider, svc_name
) ranked
WHERE rnk <= 2
ORDER BY provider, total_spend DESC
```

> **Cost Analysis**
>
> The aggregation reduces 15M rows to ~1,050. The window function sorts 350 rows per partition (3 partitions). The dominant cost is the full table scan for the initial `GROUP BY`.

> **Interviewers Watch For**
>
> L6 candidates should mention that `DENSE_RANK` is specifically chosen over `ROW_NUMBER` for tie inclusion, and articulate why the aggregation must happen before the ranking step.

> **Common Pitfall**
>
> Using `RANK` instead of `DENSE_RANK` would produce gaps (1, 1, 3 instead of 1, 1, 2), meaning the filter `<= 2` could miss the third-place service entirely.

---

## Common follow-up questions

- What if you needed the top 2 per provider per region? _(Tests compound partitioning: `PARTITION BY provider, region`.)_
- How would you break ties deterministically? _(Add a secondary sort key like `svc_name ASC` to the window ORDER BY.)_
- What if the table had 1,000 providers instead of 3? _(The number of partitions grows, and the window function's memory footprint increases.)_
- Could you solve this without window functions? _(A correlated subquery approach is valid but typically less readable and harder to optimize.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_services_per_provider)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.