# Heavy Namespaces

> Kubernetes has favorites. Some namespaces carry more weight.

Canonical URL: <https://datadriven.io/problems/heavy_namespaces>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The SRE team suspects a few Kubernetes namespaces are absorbing most of the cluster's pod capacity. Show namespaces that have more than 3 pods along with their pod count.

## Worked solution and explanation

### What this is really asking

`nspace` is the grouping key and `COUNT(*)` over 2M `k8s_pods` rows is the metric. The threshold `> 3` is strict, so a namespace with exactly 3 pods does not appear in the result.

---

### Break down the requirements

#### Step 1: Group by namespace

Aggregate `k8s_pods` by `nspace`. Every row is a pod, so `COUNT(*)` equals pod count per namespace.

#### Step 2: Filter and project

`HAVING COUNT(*) > 3` keeps heavy namespaces; `WHERE` cannot see the aggregate. Alias the count as `pod_count` so the result is consumable.

---

### The solution

**HEAVY NAMESPACES BY POD COUNT**

```sql
SELECT nspace,
       COUNT(*) AS pod_count
FROM k8s_pods
GROUP BY nspace
HAVING COUNT(*) > 3
```

> **Cost Analysis**
>
> One full scan of 2M rows with a hash aggregate. An index on `nspace` lets the planner stream a sorted aggregate and cut memory, but the scan itself is unavoidable without a precomputed counts table.

> **Interviewers Watch For**
>
> Placement of the count predicate in `HAVING` not `WHERE`, and `COUNT(*)` over `COUNT(pod_id)` when the column is non-null.

> **Common Pitfall**
>
> `HAVING COUNT(*) >= 3` returns namespaces with exactly 3 pods, which the prompt excludes. `more than 3` is strict inequality.

> **The False Start**
>
> First instinct is `WHERE COUNT(*) > 3`. Postgres throws `aggregate functions are not allowed in WHERE`. Pivot to `HAVING`, which runs after `GROUP BY` and is the only place aggregates can be filtered.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you exclude pods in a terminal status from the count? _(Add `WHERE status NOT IN ('Failed','Succeeded')` before the group so completed workloads do not inflate capacity numbers.)_
- How would you rank namespaces by pod share of the cluster? _(Wrap the aggregate in a CTE and divide each `pod_count` by `SUM(pod_count) OVER ()`.)_
- What changes if `nspace` can be NULL? _(`GROUP BY` collapses all NULLs into one bucket; decide whether it is a real namespace or a data quality issue.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/heavy_namespaces)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.