# Regional Status

> The full regional breakdown.

Canonical URL: <https://datadriven.io/problems/regional_status>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The operations team is diagnosing fulfillment bottlenecks across regions. For each region and order status combination that has valid data on both fields, show the number of orders. Only surface combinations that account for at least three orders, and rank the results from the most common combination to the least.

## Worked solution and explanation

### Why this problem exists in real interviews

Group by two columns, filter the groups by size, sort with a multi-key tiebreaker: this is the HAVING + ORDER BY interview question, and it covers three specific SQL skills in one problem. Every BI dashboard is built on queries of this shape. Interviewers grade whether you know HAVING vs WHERE (WHERE filters rows, HAVING filters groups), whether you handle NULLs explicitly, and whether you spell out the tiebreak columns in the exact order the spec dictates.

---

### Break down the requirements

#### Step 1: Filter out NULL region or status before grouping

WHERE region IS NOT NULL AND status IS NOT NULL applies to rows, not groups, so it belongs in WHERE. Doing it in HAVING would still work because COUNT excludes NULLs per-column, but the NULL group itself would appear in output unless filtered. WHERE is the honest expression: 'drop bad rows first.'

#### Step 2: Group by (region, status)

GROUP BY region, status produces one row per distinct pair. COUNT(*) in the SELECT list tallies rows per group.

#### Step 3: Filter groups with HAVING

HAVING COUNT(*) >= 3 drops pairs with fewer than 3 rows. HAVING is the group-level filter; WHERE cannot reference aggregate values, so HAVING is the only place this filter can live.

#### Step 4: Sort by count DESC, then region ASC, then status ASC

ORDER BY order_count DESC, region ASC, status ASC. The spec is explicit about the tiebreak chain. ASC is default but spelling it out makes the intent self-documenting and survives a copy-paste into a different engine with different defaults.

---

### The solution

**WHERE for rows, HAVING for groups, explicit tiebreak**

```sql
SELECT
    region,
    status,
    COUNT(*) AS order_count
FROM orders
WHERE region IS NOT NULL AND status IS NOT NULL
GROUP BY region, status
HAVING COUNT(*) >= 3
ORDER BY order_count DESC, region ASC, status ASC
```

> **Cost Analysis**
>
> One table scan, O(n). GROUP BY uses hash aggregation (O(n) average, O(n log n) with sort-based group by on no index). With an index on (region, status), the engine can stream sorted rows and aggregate in one pass. HAVING is a simple filter on the grouped rows, negligible cost relative to the aggregation.

> **Interviewers Watch For**
>
> Whether you use WHERE vs HAVING correctly (WHERE for rows, HAVING for groups), whether you handle NULLs per spec, and whether you nail the tiebreak order without asking twice. Strong candidates note that COUNT(*) counts rows including those with NULL in any column, while COUNT(column) skips NULLs in that column, and explain why COUNT(*) is the right call here.

> **Common Pitfall**
>
> Putting the COUNT filter in WHERE: 'WHERE COUNT(*) >= 3' raises a syntax error because WHERE runs before aggregation. Another classic: swapping ASC and DESC in the tiebreak and getting rows in the wrong order when counts tie. And the subtle one: filtering NULLs with HAVING instead of WHERE, which works for COUNT but breaks if you later add SUM or AVG (which include NULLs in the group).

---

## Common follow-up questions

- What changes if the spec wanted only the single highest-count row (with ties broken by region, status)? _(add LIMIT 1; or use a window function with DENSE_RANK and filter to rank = 1. Discuss the difference when ties should all survive.)_
- How would you also return the total revenue per group? _(add SUM(profit) AS total_profit; discuss NULL handling in SUM (skipped, not treated as zero).)_
- How would you rewrite this without HAVING? _(a subquery that computes counts first, then a WHERE on the outer query; show why this is logically identical but sometimes optimized differently.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/regional_status)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.