# Most Profitable Region Month

> One region, one month. Peak profit.

Canonical URL: <https://datadriven.io/problems/most_profitable_region_month>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The finance team needs the single most profitable region-month combination in 2026. For each region, sum profit by transaction month, matching orders to transactions by their ID modulo 100. Return whichever region and month came out on top.

## Worked solution and explanation

### What this is really asking

`o.order_id % 100 = t.transaction_id % 100` is not a real key. It is a synthetic bucket join across 30M and 100M rows that fans into roughly a 30-trillion-row pre-aggregate. Every shape decision flows from that.

---

### Break down the requirements

#### Step 1: Join on the modulo bucket

Match each order to every transaction sharing the same (id mod 100). This is a many-to-many bucket join, not a referential lookup.

#### Step 2: Group by region and transaction month

Bucket the transaction_date with strftime('%m', ...) so December across years collapses together, then SUM profit per (region, month).

#### Step 3: Order by total_profit and LIMIT 1

The prompt asks for one winner. Sort descending on the aggregate and take the top row. Ties resolve arbitrarily without a tiebreaker column.

---

### The solution

**TOP REGION-MONTH BY PROFIT**

```sql
SELECT
    o.region,
    strftime('%m', t.transaction_date) AS txn_month,
    SUM(o.profit) AS total_profit
FROM orders o
INNER JOIN transactions t
    ON (o.order_id % 100) = (t.transaction_id % 100)
GROUP BY o.region, strftime('%m', t.transaction_date)
ORDER BY total_profit DESC
LIMIT 1;
```

> **Cost Analysis**
>
> Mod-100 join produces ~300M rows per bucket on average, 30T total intermediate. No index helps because the predicate is a derived expression. In real life you would materialize the bucket as a computed column or rewrite the join key.

> **Interviewers Watch For**
>
> Whether you flag the modulo join as a synthetic bucket rather than treating it as a foreign key. Also whether you notice strftime('%m', ...) ignores the year, which is fine here because the prompt scopes a single year.

> **Common Pitfall**
>
> Filtering profit > 0 in WHERE to chase performance. The prompt says most profitable, which can include net-positive regions composed of losses and gains. Dropping negative profit rows changes the answer.

> **The False Start**
>
> First instinct is `ORDER BY SUM(o.profit) DESC LIMIT 1` without a GROUP BY, expecting one row back. That returns a single grand total across all regions and months. Pivot to grouping by (region, txn_month) first, then ordering the aggregate.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you break a tie if two region-months had identical total_profit? _(Add a deterministic secondary key like region ASC or the count of contributing orders to the ORDER BY.)_
- How would you rewrite this so the optimizer can actually use an index? _(Persist `order_id % 100` as a generated column with an index, or pre-bucket both tables into a join-helper table.)_
- What if finance wants the top region-month per year, not overall? _(Group by region, year, and month, then use RANK() OVER (PARTITION BY year ORDER BY SUM(profit) DESC) and filter rnk = 1.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/most_profitable_region_month)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.