# Lowest Cost Network-Heavy Team

> Networking costs versus compute. Which teams?

Canonical URL: <https://datadriven.io/problems/lowest_cost_network_heavy_team>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Among teams that spend more on networking than ML, find the single network allocation entry with the lowest amount. Show the team name and the amount.

## Worked solution and explanation

### Why this problem exists in real interviews

FinOps reviews routinely surface 'teams whose network spend exceeds their ML spend, then drill into the smallest line item to find quick wins.' Interviewers use this two-stage problem to test whether you can compute conditional sums in one pass to qualify teams, then re-query the base table filtered to those teams to pick the single cheapest row with a deterministic tiebreak.

---

### Break down the requirements

#### Step 1: Sum network and ML spend per team in one pass

`SUM(CASE WHEN category = 'network' THEN amount ELSE 0 END) AS net_total` and the same for `'ml'`, grouped by `team_name`. This conditional aggregation collapses 15M rows into 50 team-level rows holding both totals side by side, no self-join needed.

#### Step 2: Keep teams where network strictly exceeds ML

A second CTE filters `WHERE net_total > ml_total`. Strictly greater than (not `>=`) matches the prompt; teams where the two totals happen to match are excluded. Save the qualifying `team_name` values for the next stage.

#### Step 3: Pick the smallest network row, tiebreak by alloc_id

Re-query `cost_allocs` filtered to `category = 'network'` and `team_name IN (qualified)`. `ORDER BY amount ASC, alloc_id ASC` puts the cheapest row first and uses `alloc_id` as the deterministic tiebreak the prompt requires. `LIMIT 1` returns one row with the two required columns: `team_name` and `amount`.

---

### The solution

**Conditional aggregate, qualifier filter, smallest network row**

```sql
WITH totals AS (
  SELECT team_name,
    SUM(CASE WHEN category = 'network' THEN amount ELSE 0 END) AS net_total,
    SUM(CASE WHEN category = 'ml' THEN amount ELSE 0 END) AS ml_total
  FROM cost_allocs
  GROUP BY team_name
),
qualified_teams AS (
  SELECT team_name FROM totals WHERE net_total > ml_total
)
SELECT team_name, amount
FROM cost_allocs
WHERE category = 'network' AND team_name IN (SELECT team_name FROM qualified_teams)
ORDER BY amount ASC, alloc_id ASC
LIMIT 1
```

> **Cost Analysis**
>
> `cost_allocs` is 15M rows partitioned by `period` across 36 monthly partitions. The `totals` CTE does one full scan and hash-aggregates into 50 team rows. The second scan is also full but emits only the network slice (10 distinct categories, so roughly 1.5M rows survive the filter), then a top-1 sort over the IN-list members. Two scans of the same 15M table is the cost; a covering index on `(category, team_name, amount, alloc_id)` would let the engine satisfy the second stage as an index-only seek.

> **Interviewers Watch For**
>
> Interviewers watch whether you use conditional aggregation (one pass) versus two separate aggregates and a join (two passes plus a join), whether you remember strict `>` rather than `>=` for the qualification, and whether you include the `alloc_id ASC` tiebreak so two network rows tied at the same amount resolve deterministically. The literal `'network'` (not `'networking'`) is also worth a sanity check against the schema.

> **Common Pitfall**
>
> Hard-coding the wrong category literal is the classic miss here: the prompt says `'network'` and `'ml'`, not `'networking'` or `'machine_learning'`. Either of those typos returns zero rows from the conditional sums, every team's `net_total` is 0, and the qualifier CTE is empty. The other trap is forgetting `alloc_id ASC` in the final `ORDER BY`, which makes the tied-amount case non-deterministic.

---

## Common follow-up questions

- How would you return the top three smallest network rows for qualified teams, including ties on amount? _(Tests whether the candidate switches `LIMIT 1` to a windowed `RANK() OVER (ORDER BY amount ASC)` and filters `rank <= 3` so ties at the boundary stay in the result.)_
- How would you scope the qualification to a single `period` (one month) rather than across all history? _(Tests partition-key awareness. Add `WHERE period = '2026-01'` to both the totals CTE and the final SELECT so the qualifier and the picked row share the same time window. Also enables partition pruning on the partition key.)_
- If the categories list grew to include `'storage'` and the prompt asked 'network exceeds the sum of ml plus storage,' how would your CTE change? _(Tests whether the candidate adds a third conditional sum and updates the `WHERE net_total > ml_total + storage_total` predicate without rewriting the join structure.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/lowest_cost_network_heavy_team)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.