# Nodes by Region and Type

> Broken down by region. Broken down by type.

Canonical URL: <https://datadriven.io/problems/nodes_by_region_and_type>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The capacity team is mapping fleet composition and needs node counts broken down by region and node type, alphabetical by region.

## Worked solution and explanation

### Why this problem exists in real interviews

Working against infra_nodes, this problem tests grouping on the hostname and region columns. Interviewers use it in mid-level screens because a subtle mis-grouping or filter placement changes the output without raising an error.

---

### Break down the requirements

#### Step 1: Group by region and node type

`GROUP BY region, node_type` creates one row per unique combination.

#### Step 2: Count nodes per group

`COUNT(*)` tallies the number of nodes in each region-type combination.

---

### The solution

**Multi-column GROUP BY with count**

```sql
SELECT region, node_type, COUNT(*) AS node_count
FROM infra_nodes
GROUP BY region, node_type
```

> **Cost Analysis**
>
> With `infra_nodes` (8,000 rows), this query scans a small dataset. No indexing is needed for this volume. At production scale, an index on the primary filter column would improve performance.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If infra_nodes.node_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in node_id.)_
- How would you verify that your aggregation on infra_nodes.node_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- What index would you add to infra_nodes to avoid a full table scan when filtering or sorting by node_id? _(Tests practical indexing decisions for numeric filter columns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/nodes_by_region_and_type)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.