# Team Cost Allocation Comparison

> Individual spend versus team average.

Canonical URL: <https://datadriven.io/problems/team_cost_allocation_comparison>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

The FinOps team wants each team member's cost allocation compared against their team lead's allocation and the team average. For each entry, show the member's amount, the highest amount in their team (as the lead's benchmark), and the team average, with the biggest spenders in each team first.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests whether a candidate can demonstrate structuring multi-step logic with CTEs, combining aggregation with window functions, and partitioning data correctly for per-group analysis. This is an advanced probe that separates senior candidates who can decompose complex logic from those who cannot.

> **Trick to Solving**
>
> Complex queries with multiple transformation stages are best solved by writing one CTE per stage.
> 
> 1. Identify the intermediate results needed
> 2. Write each as a named CTE
> 3. Chain them: each CTE reads from the previous one
> 4. The final SELECT assembles the output

---

### Break down the requirements

#### Step 1: Isolate the intermediate result in a CTE

The `managers` CTE computes the intermediate aggregation that the outer query builds on. This separation keeps each layer focused on a single task.

#### Step 2: Left join to preserve all base rows

A `LEFT JOIN` from `cost_allocs` ensures every row appears in the output even if there is no match in `managers`. Missing values become NULL.

#### Step 3: Filter out null values

Exclude rows where `amount` is NULL. This prevents nulls from polluting aggregations or creating phantom groups.

---

### The solution

**Window aggregates over team_name partition for member-vs-lead-vs-average comparison**

```sql
WITH managers AS (
    SELECT team_name, MAX(amount) AS mgr_amount
    FROM cost_allocs
    GROUP BY team_name
)
SELECT ca.team_name, ca.svc_name, ca.amount, m.mgr_amount AS manager_amount, AVG(ca.amount) OVER (PARTITION BY ca.team_name) AS dept_avg
FROM cost_allocs ca
LEFT
JOIN managers m ON ca.team_name = m.team_name
WHERE ca.amount IS NOT NULL
ORDER BY ca.team_name, ca.amount DESC
```

> **Cost Analysis**
>
> With ~25M rows, the GROUP BY reduces the working set before any downstream operations; the window function runs on the reduced set after filtering and grouping; the join cost depends on the smaller table's cardinality; CTEs materialize intermediate results, which can be beneficial or costly depending on the engine. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you decompose the problem into named, testable stages rather than nesting everything; whether you choose the correct join type to avoid silently dropping rows; how you handle NULL values and whether you account for them in filters and aggregations.

> **Common Pitfall**
>
> Forgetting to filter NULLs creates phantom groups or inflated counts. Always check `null_fraction` in the schema before assuming columns are clean.

---

## Common follow-up questions

- How do you identify which row belongs to the 'team lead' if there is no explicit role column in cost_allocs? _(Tests whether the candidate asks for clarification or assumes the highest amount equals the lead.)_
- If a team has only one entry, what values do the comparison columns show? _(Tests edge case: the member's amount equals both the max and the average, so all three columns are identical.)_
- How would you extend this to show whether each member's allocation is above or below the team median? _(Tests knowledge that median is not a built-in aggregate in most engines and requires PERCENTILE_CONT or a manual approach.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/team_cost_allocation_comparison)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.