# The Stable and the Restless

> Some pods never restart. That could mean anything.

Canonical URL: <https://datadriven.io/problems/pod_distribution_by_restart_count>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The reliability team is investigating low-restart pods. Show a count of pods for each restart count value, but only where the restart count is below 10, ranked from highest restart count down.

## Worked solution and explanation

### Why this problem exists in real interviews

This problem targets grouped COUNT aggregation across the `k8s_pods` table. You need to work with the `restarts` column to satisfy the requirements.

---

### Break down the requirements

#### Step 1: Filter to low restart counts

`WHERE restarts < 10` restricts to the range of interest.

#### Step 2: Group by restart count

`GROUP BY restarts` produces one row per distinct restart value.

#### Step 3: Count pods per restart count

`COUNT(*)` tallies how many pods share each restart count.

#### Step 4: Order descending

`ORDER BY restarts DESC` lists from highest restart count down.

---

### The solution

**Filter to low restart counts to find pod distribution by restart count**

```sql
SELECT restarts, COUNT(*) AS pod_count
FROM k8s_pods
WHERE restarts < 10
GROUP BY restarts
ORDER BY restarts DESC
```

> **Cost Analysis**
>
> The query scans `k8s_pods` (1,500,000 rows). A covering index on the filter and group columns would reduce I/O. At this scale, the full scan is acceptable but becomes costly if the table grows 10x.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- The `mem_used` column in `k8s_pods` has a 4% null rate. How does your query handle rows where `mem_used` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `k8s_pods.mem_used` and understands how aggregates skip NULL values.)_
- `k8s_pods.pod_name` has roughly 139,280 distinct values. What index strategy would you use to avoid a full scan on `k8s_pods`? _(Tests indexing knowledge specific to the high-cardinality `pod_name` column in `k8s_pods`.)_
- `k8s_pods.status` only has 5 distinct values. If a new category were added, would your query automatically include it? _(Tests whether the query hard-codes values or dynamically adapts to `status` changes.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/pod_distribution_by_restart_count)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.