# Rolling Revenue Average

> Smooth out the revenue bumps. The trend matters more.

Canonical URL: <https://datadriven.io/problems/rolling_revenue_average>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

Compute a 3-month rolling average of total revenue from transactions, excluding refunds (negative amounts). For each month, the average uses the current month and the two preceding months. Show year-month in YYYY-MM format and the rolling average, sorted chronologically. The first two months will not be true 3-month averages.

## Worked solution and explanation

### Why this problem exists in real interviews

Drawn from a revenue analysis domain, this question centers on custom window frame specification over the `transactions` table. The tricky part is handling the `total_amount` and `transaction_date` columns correctly under the given constraints.

> **Trick to Solving**
>
> Rolling or sliding window problems require an explicit frame clause. The default frame is rarely what you want.
> 
> 1. Identify the window size from the prompt (e.g., '3-month rolling')
> 2. Use `ROWS BETWEEN N PRECEDING AND CURRENT ROW`
> 3. Partition by the grouping key, order by the time column

---

### Break down the requirements

#### Step 1: Isolate the intermediate result in a CTE

The `monthly_rev` CTE computes the intermediate aggregation that the outer query builds on. This separation keeps each layer focused on a single task.

#### Step 2: Apply the range filter

The WHERE clause restricts rows to the target range. Applying this filter early reduces the volume flowing into downstream operations.

---

### The solution

**Sliding-window for rolling revenue average**

```sql
WITH monthly_rev AS (
    SELECT strftime('%Y-%m', transaction_date) AS ym, SUM(total_amount) AS revenue
    FROM transactions
    WHERE total_amount >= 0
    GROUP BY strftime('%Y-%m', transaction_date)
)
SELECT ym, AVG(revenue) OVER (ORDER BY ym ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_avg
FROM monthly_rev
ORDER BY ym
```

> **Cost Analysis**
>
> With ~200M rows, the GROUP BY reduces the working set before any downstream operations; the window function runs on the reduced set after filtering and grouping; CTEs materialize intermediate results, which can be beneficial or costly depending on the engine. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you decompose the problem into named, testable stages rather than nesting everything; whether you explicitly define the window frame or rely on defaults that may not match the requirement; how you handle date arithmetic and whether you account for edge cases like month boundaries.

> **Common Pitfall**
>
> Omitting the explicit frame clause (`ROWS BETWEEN ...`) relies on the default, which is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW` and may not match the intent.

---

## Common follow-up questions

- What would happen to your result if `transactions.transaction_date` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `transaction_date` and uses DISTINCT or deduplication where needed.)_
- If `transactions` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `total_amount`? _(Tests ability to identify performance hotspots related to `transactions.total_amount` at scale.)_
- Does your database engine materialize the CTE or inline it? How would that affect repeated scans of `transactions`? _(Tests understanding of CTE materialization semantics.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/rolling_revenue_average)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.