# The Accumulator

> A total that builds row by row. Structure the query to match.

Canonical URL: <https://datadriven.io/problems/running_total_with_cte>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

The FinOps dashboard needs a daily cloud cost time series with a running cumulative total so stakeholders can see how spending accumulates through the billing period. Show each billing date, its daily spend, and the cumulative spend through that date.

## Worked solution and explanation

### Why this problem exists in real interviews

This cloud cost problem uses the `cloud_costs` table to evaluate grouped SUM aggregation. Watch how the `bill_date` and `amount` columns interact in the grouping and filtering logic.

> **Trick to Solving**
>
> Complex queries with multiple transformation stages are best solved by writing one CTE per stage.
> 
> 1. Identify the intermediate results needed
> 2. Write each as a named CTE
> 3. Chain them: each CTE reads from the previous one
> 4. The final SELECT assembles the output

---

### Break down the requirements

#### Step 1: Isolate the intermediate result in a CTE

The `daily` CTE computes the intermediate aggregation that the outer query builds on. This separation keeps each layer focused on a single task.

#### Step 2: Select the target columns

The SELECT clause picks exactly the columns the prompt asks for. Returning extra columns or missing a required alias would fail the grading check.

---

### The solution

**Isolate the intermediate result in a cte to find running total with...**

```sql
WITH daily AS (
    SELECT bill_date, SUM(amount) AS daily_spend
    FROM cloud_costs
    GROUP BY bill_date
)
SELECT bill_date, daily_spend, SUM(daily_spend) OVER (ORDER BY bill_date) AS cumulative_spend
FROM daily
ORDER BY bill_date
```

> **Cost Analysis**
>
> With ~15M rows, the GROUP BY reduces the working set before any downstream operations; the window function runs on the reduced set after filtering and grouping; CTEs materialize intermediate results, which can be beneficial or costly depending on the engine. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you decompose the problem into named, testable stages rather than nesting everything; whether you use a subquery or self-join, and can explain the tradeoffs.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- What would happen to your result if `cloud_costs.cost_id` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `cost_id` and uses DISTINCT or deduplication where needed.)_
- If `cloud_costs` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `cost_id`? _(Tests ability to identify performance hotspots related to `cloud_costs.cost_id` at scale.)_
- If `cloud_costs` contained late-arriving rows that were inserted after your query ran, how would you design an incremental update instead of re-aggregating? _(Tests understanding of incremental aggregation patterns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/running_total_with_cte)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.