# Memory-Heavy Pods

> Memory-hungry workloads.

Canonical URL: <https://datadriven.io/problems/memory_heavy_pods>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The SRE team is profiling mid-range memory consumers across the Kubernetes cluster. Pull all unique pod names where memory usage falls between 100 and 500.

## Worked solution and explanation

### Why this problem exists in real interviews

The interviewer wants to see you apply query construction to k8s_pods.pod_name while accounting for the distribution of nspace. This surfaces as a fundamentals check because small logic errors produce results that look correct at a glance.

---

### Break down the requirements

#### Step 1: Read from `k8s_pods`

The query targets `k8s_pods` with 7 columns. Identify which columns are needed for the output.

#### Step 2: Filter to the target rows

Use `BETWEEN` in the `WHERE` clause to select the target range. This is both readable and optimizable by the query planner.

#### Step 3: Return the result set

Select the required columns with any necessary aliasing or formatting.

---

### The solution

**BETWEEN range filter with DISTINCT**

```sql
SELECT DISTINCT pod_name
FROM k8s_pods
WHERE mem_used BETWEEN 100 AND 500
```

> **Cost Analysis**
>
> The query scans 2M rows from `k8s_pods`. The aggregation reduces the row count before any downstream processing, which is the key performance lever.

> **Interviewers Watch For**
>
> Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax. Explaining why `ROW_NUMBER` is preferred over `DISTINCT` for deduplication shows you understand the difference between collapsing and selecting.

> **Common Pitfall**
>
> Placing a filter in `WHERE` instead of `HAVING` (or vice versa) is a common mistake. `WHERE` filters rows before aggregation; `HAVING` filters groups after.

---

## Common follow-up questions

- What happens to your result if k8s_pods.pod_name contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on pod_name.)_
- How would you verify that your aggregation on k8s_pods.pod_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in k8s_pods.pod_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like pod_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/memory_heavy_pods)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.