# Lowest CPU Pods per Namespace

> The five lightest pods per namespace.

Canonical URL: <https://datadriven.io/problems/lowest_cpu_pods_per_namespace>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

For each Kubernetes namespace, surface the 5 pods with the lowest CPU usage. If pods are tied, they should share the same rank without creating gaps. Show each pod's name, namespace, CPU, and memory usage.

## Worked solution and explanation

### Why this problem exists in real interviews

The k8s_pods table contains pod_name and nspace values that must be processed with dense ranking. This appears in senior-level rounds to probe whether you reason about the correct aggregation grain before writing any window or GROUP BY clause.

> **Trick to Solving**
>
> Look for language about ties or "include all at position N." This signals `DENSE_RANK` over `ROW_NUMBER` or `LIMIT`.
> 
> 1. Identify tie-inclusion language in the prompt
> 2. Use `DENSE_RANK()` instead of `ROW_NUMBER()` or `LIMIT`
> 3. Aggregate to the correct grain before ranking

---

### Break down the requirements

#### Step 1: Filter to the target rows

Apply the `WHERE` filter to restrict the working set before aggregation. Filtering early reduces the number of rows that downstream operations process.

#### Step 2: Rank with DENSE_RANK for tie inclusion

`DENSE_RANK()` assigns the same rank to tied values and never skips numbers. This ensures all tied rows appear in the result.

#### Step 3: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**DENSE_RANK per namespace for tie-inclusive top-N**

```sql
SELECT pod_name, nspace, cpu_used, mem_used
FROM (
    SELECT pod_name, nspace, cpu_used, mem_used,
        DENSE_RANK() OVER (PARTITION BY nspace ORDER BY cpu_used ASC) AS rnk
    FROM k8s_pods
) ranked
WHERE rnk <= 5
ORDER BY nspace, rnk
```

> **Cost Analysis**
>
> The query scans 3M rows from `k8s_pods`. The window function requires a sort, which is O(n log n). Pre-aggregating reduces the sort input. CTEs in most engines are optimization fences. For production workloads, consider inlining or materializing the intermediate results.

> **Interviewers Watch For**
>
> Strong candidates explain their choice of window function (`ROW_NUMBER` vs `RANK` vs `DENSE_RANK`) and why it matches the tie semantics. Walking through comparison logic step by step, rather than writing it in one pass, demonstrates structured thinking.

> **Common Pitfall**
>
> Using `ROWS` vs `RANGE` in the window frame produces different results when ties exist. Default to `ROWS` unless you specifically need tie grouping.

---

## Common follow-up questions

- What happens to your result if k8s_pods.pod_name contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on pod_name.)_
- If two rows in k8s_pods have identical values in the ORDER BY columns, how does your ranking handle the tie? _(Tests understanding of RANK vs DENSE_RANK vs ROW_NUMBER tie-breaking behavior.)_
- With millions of distinct values in k8s_pods.pod_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like pod_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/lowest_cpu_pods_per_namespace)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.