# Pod CPU to Memory Ratio

> CPU versus memory. Resource efficiency.

Canonical URL: <https://datadriven.io/problems/pod_cpu_to_memory_ratio>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For pending pods in each namespace, what is the average ratio of CPU used to memory used? Rank from highest ratio to lowest.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this analytics scenario to test grouped AVG aggregation against the `k8s_pods` table. The focus is on how you handle columns like `status`, `nspace`, and `cpu_used` when building the result.

---

### Break down the requirements

#### Step 1: Filter to pending pods

`WHERE status = 'Pending'` restricts to the target pod status.

#### Step 2: Group by namespace

`GROUP BY nspace` aggregates all pending pods within each namespace.

#### Step 3: Compute average CPU/memory ratio

`AVG(cpu_used / mem_used)` calculates the average ratio. Guard against zero `mem_used` if needed.

#### Step 4: Order descending

`ORDER BY avg_ratio DESC` surfaces the most CPU-skewed namespaces first.

---

### The solution

**Filter to pending pods to find pod cpu to memory ratio**

```sql
SELECT
    nspace,
    AVG(cpu_used / mem_used) AS avg_cpu_mem_ratio
FROM k8s_pods
WHERE status = 'Pending'
  AND mem_used > 0
GROUP BY nspace
ORDER BY avg_cpu_mem_ratio DESC
```

> **Cost Analysis**
>
> The query scans `k8s_pods` (2,000,000 rows). A covering index on the filter and group columns would reduce I/O. At this scale, the full scan is acceptable but becomes costly if the table grows 10x.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What result would you get if every value in `k8s_pods.mem_used` were NULL? Would your query return an empty set or something unexpected? _(Tests extreme NULL scenarios and whether the candidate guards against edge cases in `mem_used`.)_
- `k8s_pods.pod_id` has roughly 2,000,000 distinct values. What index strategy would you use to avoid a full scan on `k8s_pods`? _(Tests indexing knowledge specific to the high-cardinality `pod_id` column in `k8s_pods`.)_
- `k8s_pods.status` only has 5 distinct values. If a new category were added, would your query automatically include it? _(Tests whether the query hard-codes values or dynamically adapts to `status` changes.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/pod_cpu_to_memory_ratio)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.