# Monthly Revenue Change

> Revenue, month over month.

Canonical URL: <https://datadriven.io/problems/monthly_revenue_change>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

For each month, compute total transaction revenue, then calculate the percentage change from the prior month, rounded to 2 decimal places. Results should appear chronologically. The first month has no prior month, so show null for its percentage change.

## Worked solution and explanation

### Why this problem exists in real interviews

Against the transactions table, lag/lead comparison and grouping on quantity values is the key operation. Interviewers favor this in senior-level rounds because it exposes whether candidates handle ties, NULLs, and ordering edge cases correctly.

> **Trick to Solving**
>
> Period-over-period comparison is cleanest with `LAG`.
> 
> 1. Use `LAG(metric) OVER (ORDER BY period)` to get the prior value
> 2. Compute the difference or ratio in the outer query
> 3. Handle the first period where LAG returns NULL

---

### Break down the requirements

#### Step 1: Set up a CTE for the intermediate result

Wrap the first transformation in a `WITH` clause. This names the intermediate result set and keeps the outer query clean.

#### Step 2: Access the prior row with LAG

`LAG(metric) OVER (ORDER BY ...)` pulls the previous row's value into the current row without a self-join. The first row returns NULL.

#### Step 3: Aggregate with SUM

Group by the output grain and apply `SUM()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 4: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**LAG for month-over-month percentage change**

```sql
WITH monthly AS (
    SELECT STRFTIME('%Y-%m', transaction_date) AS month,
        SUM(total_amount) AS revenue
    FROM transactions
    GROUP BY STRFTIME('%Y-%m', transaction_date)
)
SELECT month, revenue,
    ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) * 100.0 /
        LAG(revenue) OVER (ORDER BY month), 2) AS pct_change
FROM monthly
ORDER BY month
```

> **Cost Analysis**
>
> The query scans 250M rows from `transactions`. The aggregation reduces the row count before any downstream processing, which is the key performance lever.

> **Interviewers Watch For**
>
> Explicitly mentioning NULL handling before being asked signals production awareness. Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax.

> **Common Pitfall**
>
> NULL values are silently excluded from `COUNT(column)` but included in `COUNT(*)`. Mixing these up produces incorrect totals.

---

## Common follow-up questions

- If transactions.transaction_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in transaction_id.)_
- What does your LAG or LEAD return for the first or last row in each partition of transactions? _(Tests awareness of NULL defaults at partition boundaries and COALESCE usage.)_
- With millions of distinct values in transactions.transaction_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like transaction_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/monthly_revenue_change)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.