# Regional Profits

> P&L by region. Before the board meeting.

Canonical URL: <https://datadriven.io/problems/regional_profits>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

Headquarters is preparing the quarterly P&L deck and needs a regional breakdown. For every region that has order data, show the total profit and the number of orders. Some orders have missing region information and should be left out. Regions should appear from highest total profit to lowest.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this order processing scenario to test grouped SUM aggregation against the `orders` table. The focus is on how you handle the `region` column when building the result.

---

### Break down the requirements

#### Step 1: Filter out null values

Exclude rows where `region` is NULL. This prevents nulls from polluting aggregations or creating phantom groups.

#### Step 2: Aggregate by `region`

`GROUP BY region` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 3: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Filter out null values to find regional profits**

```sql
SELECT region, SUM(profit) AS total_profit, COUNT(*) AS order_count
FROM orders
WHERE region IS NOT NULL
GROUP BY region
ORDER BY total_profit DESC
```

> **Cost Analysis**
>
> With ~500K rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for how you handle NULL values and whether you account for them in filters and aggregations.

> **Common Pitfall**
>
> Forgetting to filter NULLs creates phantom groups or inflated counts. Always check `null_fraction` in the schema before assuming columns are clean.

---

## Common follow-up questions

- If `status` in `orders` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `orders.status`.)_
- `orders.order_id` has roughly 500,000 distinct values. What index strategy would you use to avoid a full scan on `orders`? _(Tests indexing knowledge specific to the high-cardinality `order_id` column in `orders`.)_
- If `orders` contained late-arriving rows that were inserted after your query ran, how would you design an incremental update instead of re-aggregating? _(Tests understanding of incremental aggregation patterns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/regional_profits)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.