# The Slow Build

> Month over month, the number grows. Track how the average moves with it.

Canonical URL: <https://datadriven.io/problems/cumulative_monthly_revenue_avg>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For each month in 2026, show the total revenue and a running average of all months up to and including the current one, rounded to the nearest whole number.

## Worked solution and explanation

### Why this problem exists in real interviews

Finance dashboards almost always pair monthly revenue with a running average so the chart shows both the spike and the trend line. Interviewers use this prompt to see whether you can aggregate by year-month, layer a window function over the aggregate, and use the right ROWS frame so 'cumulative' truly means 'all months up to and including this one'.

---

### Break down the requirements

#### Step 1: Aggregate revenue per year-month

Inside a subquery, GROUP BY strftime('%Y-%m', transaction_date) and project SUM(total_amount) AS monthly_revenue. The strftime year filter scopes to 2026 so the cumulative average resets each year.

#### Step 2: Compute the running average with an explicit ROWS frame

AVG(monthly_revenue) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) is the cumulative average. The default frame for ORDER BY without ROWS is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which behaves differently when there are duplicate ORDER BY values; spelling out ROWS removes any ambiguity.

#### Step 3: Round to a whole number, kept as DOUBLE

CAST(ROUND(AVG(...)) AS DOUBLE) rounds to the nearest integer but preserves the DOUBLE type so downstream consumers do not have to handle a sudden type switch. ORDER BY month ascending makes the chart left-to-right.

---

### The solution

**Year-month aggregate plus a ROWS-framed cumulative AVG**

```sql
SELECT month, monthly_revenue,
  CAST(ROUND(AVG(monthly_revenue) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) AS DOUBLE) AS cumulative_avg
FROM (
  SELECT strftime('%Y-%m', transaction_date) AS month, SUM(total_amount) AS monthly_revenue
  FROM transactions
  WHERE strftime('%Y', transaction_date) = '2026'
  GROUP BY strftime('%Y-%m', transaction_date)
)
ORDER BY month
```

> **Cost Analysis**
>
> transactions has 120,000,000 rows. The strftime filter on transaction_date is non-sargable, so without an index this is a full scan. After aggregation, the result is at most 12 rows per year, so the window function is essentially free. The dominant cost is reading and grouping 120M rows; production systems would precompute monthly_revenue into a small rollup table.

> **Interviewers Watch For**
>
> They want strftime (not Postgres TO_CHAR or DATE_TRUNC), they want the explicit ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW frame, and they want the inner subquery handling the GROUP BY. A candidate who tries to nest SUM and AVG in a single SELECT without a subquery (or who reaches for a self-join sum-of-prior-months pattern) is signaling weak window-function fluency.

> **Common Pitfall**
>
> Omitting ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW falls back to the RANGE default, which can include rows tied on the ORDER BY value. With unique year-month strings the result happens to match here, but the habit is dangerous and SQLite older versions had bugs around RANGE frames. Forgetting to wrap in CAST(... AS DOUBLE) returns an INTEGER and downstream Pydantic schemas may reject it.

---

## Common follow-up questions

- How would the answer change if you needed a 3-month rolling average instead of cumulative? _(Switches the frame to ROWS BETWEEN 2 PRECEDING AND CURRENT ROW. The candidate should note that the first two months are computed over fewer values and decide whether to NULL them or accept the partial window.)_
- What if the spec said 'across all years in the table' rather than just one year? _(Drops the year filter and makes the cumulative average span every month in transactions. The candidate should mention adding PARTITION BY strftime('%Y', month) if the average should reset each year.)_
- Why ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW instead of RANGE? _(Tests frame semantics. ROWS counts physical rows; RANGE counts logical values tied on the ORDER BY key. With unique month strings they match, but the candidate should explain why ROWS is the safer default for cumulative aggregates.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/cumulative_monthly_revenue_avg)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.