# Storage Node Lookup

> The storage nodes hold the critical data.

Canonical URL: <https://datadriven.io/problems/storage_node_lookup>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

Find all nodes with a storage-type designation, along with each node's hostname and CPU percentage.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests whether a candidate can demonstrate writing clean, correct queries under time pressure. This is a foundational check that interviewers use early in a round to verify baseline proficiency.

---

### Break down the requirements

#### Step 1: Apply the WHERE filter

Filter rows before any aggregation. This ensures only qualifying data enters the computation, keeping the result correct and the scan minimal.

#### Step 2: Select the target columns

The SELECT clause picks exactly the columns the prompt asks for. Returning extra columns or missing a required alias would fail the grading check.

---

### The solution

**Filter infra_nodes by node_type and project two columns**

```sql
SELECT hostname, cpu_pct
FROM infra_nodes
WHERE LOWER(node_type) = 'storage'
```

> **Cost Analysis**
>
> With ~8,000 rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- If node_type values are inconsistent in casing ('Storage' vs 'storage'), how would you handle that in the WHERE clause? _(Tests awareness of case sensitivity and defensive filtering with LOWER() or ILIKE.)_
- Which index on infra_nodes would speed this query when the table grows to hundreds of millions of health checks? _(Tests whether candidate proposes a composite index on (node_type) covering (hostname, cpu_pct).)_
- If the prompt also asked for memory percentage, how would you confirm you are not accidentally filtering out nodes with NULL mem_pct? _(Tests awareness that WHERE conditions on one column do not implicitly drop NULLs in other projected columns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/storage_node_lookup)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.