# Nodes in Key Regions

> Six regions. How many nodes in each?

Canonical URL: <https://datadriven.io/problems/nodes_in_key_regions>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The capacity planning team is auditing node distribution across six key regions: us-east-1, us-west-2, eu-west-1, eu-central-1, ap-southeast-1, and ap-northeast-1. What is the total node count across those regions?

## Worked solution and explanation

### Why this problem exists in real interviews

Against the infra_nodes table, counting on hostname values is the key operation. Interviewers favor this in mid-level screens because it exposes whether candidates handle ties, NULLs, and ordering edge cases correctly.

---

### Break down the requirements

#### Step 1: Filter to the six target regions

`WHERE region IN ('us-east-1', 'us-west-2', 'eu-west-1', 'eu-central-1', 'ap-southeast-1', 'ap-northeast-1')` restricts to the specified regions.

#### Step 2: Count total nodes

`SELECT COUNT(*) FROM infra_nodes WHERE ...` produces the single total count across all six regions.

---

### The solution

**IN filter with COUNT**

```sql
SELECT COUNT(*) AS total_nodes
FROM infra_nodes
WHERE region IN ('us-east-1', 'us-west-2', 'eu-west-1', 'eu-central-1', 'ap-southeast-1', 'ap-northeast-1')
```

> **Cost Analysis**
>
> With `infra_nodes` (10,000 rows), this query scans a small dataset. No indexing is needed for this volume. At production scale, an index on the primary filter column would improve performance.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If infra_nodes.node_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in node_id.)_
- How would you verify that your aggregation on infra_nodes.node_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- What index would you add to infra_nodes to avoid a full table scan when filtering or sorting by node_id? _(Tests practical indexing decisions for numeric filter columns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/nodes_in_key_regions)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.