# Average GPU Node CPU Usage

> GPU nodes burning CPU. How much?

Canonical URL: <https://datadriven.io/problems/average_gpu_node_cpu_usage>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The ML training cluster has been throttling jobs and the infrastructure team suspects the GPU nodes are saturated. What is the average CPU percentage across all infrastructure nodes classified as 'gpu'?

## Worked solution and explanation

### Why this problem exists in real interviews

Looks like a one-liner: `AVG(cpu_pct) WHERE node_type = 'gpu'` on `infra_nodes`. The interviewer is scoring what you say before you type. Grain of the table, NULL behavior of AVG, and case sensitivity of the filter literal all live in this five line query.

---

### Break down the requirements

#### Step 1: Pin the grain

Ask: is `infra_nodes` one row per node, or one snapshot per node over time? If snapshots, your average mixes nodes and time. Three thousand rows hints at one row per node, confirm.

#### Step 2: Filter on node_type

`WHERE node_type = 'gpu'` is the cohort. Wrap with `LOWER(node_type) = 'gpu'` if the column is free text, since 'GPU' and 'Gpu' would silently drop out of the average.

#### Step 3: AVG handles NULL on its own

`AVG(cpu_pct)` skips NULL rows in both numerator and denominator. No `COALESCE`, no `WHERE cpu_pct IS NOT NULL`. Flag the assumption out loud: missing telemetry rows are excluded, not zeroed.

#### Step 4: Alias the scalar

Return `AVG(cpu_pct) AS avg_cpu_pct`. Single scalar, one row out. No `GROUP BY` needed when every selected expression is an aggregate.

---

### The solution

**GPU NODE AVERAGE CPU**

```sql
SELECT AVG(cpu_pct) AS avg_cpu_pct
FROM infra_nodes
WHERE node_type = 'gpu';
```

> **Cost Analysis**
>
> Three thousand rows is a single sequential scan, sub-millisecond. No index helps at this size; the planner will skip one anyway. If `infra_nodes` grew to tens of millions of snapshots, a partial index on `(node_type) WHERE node_type = 'gpu'` would be the move.

> **Interviewers Watch For**
>
> Before you write, say: 'Is `infra_nodes` current state or a snapshot history? And are NULL `cpu_pct` rows missing telemetry, or genuinely zero?' Those two sentences are the whole interview on a trivial query.

> **Common Pitfall**
>
> Wrapping `AVG` with `COALESCE(cpu_pct, 0)` to handle NULLs. That treats unreachable nodes as fully idle and drags the average down. AVG already excludes NULLs from the count. Adding `COALESCE` changes the answer.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would the query change if `infra_nodes` stored hourly snapshots instead of current state? _(Probes grain awareness and whether you per-node average first to avoid weighting noisy nodes.)_
- Return average CPU for every `node_type`, sorted highest first. _(Forces the jump from scalar aggregate to `GROUP BY node_type` with `ORDER BY avg_cpu_pct DESC`.)_
- Only count GPU nodes that are `status = 'healthy'`. What changes? _(Tests whether you add the predicate to `WHERE` rather than wrapping `AVG` in a `CASE` expression.)_
- How would you flag GPU nodes whose CPU is more than two standard deviations above the GPU mean? _(Opens window functions: `AVG` and `STDDEV` `OVER (PARTITION BY node_type)`.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/average_gpu_node_cpu_usage)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.