# Service Budget per Head

> Budget per head. Pipeline by pipeline.

Canonical URL: <https://datadriven.io/problems/service_budget_per_head>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For each service that appears in both the cost tracking and budget allocation systems, count how many unique teams are allocated to it and divide total allocated amount by that count, rounded to the nearest integer. Show highest budget per head first.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply self-join to the `cloud_costs` and `cost_allocs` tables, simulating a real analytics workflow. Pay attention to columns like `svc_name`, `amount`, and `team_name` as they drive the aggregation and output.

---

### Break down the requirements

#### Step 1: Join `cloud_costs` to `cost_allocs`

The join connects the two tables on their shared key. This brings the columns needed for filtering and aggregation into a single row set.

#### Step 2: Deduplicate the result with DISTINCT

`SELECT DISTINCT` removes duplicate rows from the output. This is necessary when joins or subqueries can produce repeated combinations.

---

### The solution

**Join `cloud_costs` to `cost_allocs` to find service budget per head**

```sql
SELECT cc.svc_name, ROUND(SUM(ca.amount) / COUNT(DISTINCT ca.team_name)) AS budget_per_head
FROM cloud_costs cc
INNER JOIN cost_allocs ca ON cc.svc_name = ca.svc_name
GROUP BY cc.svc_name
ORDER BY budget_per_head DESC
```

> **Cost Analysis**
>
> With ~27M rows, the join cost depends on the smaller table's cardinality. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you know when DISTINCT is needed and when it masks a logic error.

> **Common Pitfall**
>
> Using `COUNT(*)` instead of `COUNT(DISTINCT col)` counts duplicates. If the prompt says 'unique', you need DISTINCT inside the aggregate.

---

## Common follow-up questions

- What would happen to your result if `cloud_costs.bill_date` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `bill_date` and uses DISTINCT or deduplication where needed.)_
- If `cost_allocs` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `amount`? _(Tests ability to identify performance hotspots related to `cost_allocs.amount` at scale.)_
- If `cloud_costs` contained late-arriving rows that were inserted after your query ran, how would you design an incremental update instead of re-aggregating? _(Tests understanding of incremental aggregation patterns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/service_budget_per_head)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.