# Peak Spending Month

> One month, the bill was unforgettable.

Canonical URL: <https://datadriven.io/problems/peak_spending_month>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The CFO flagged a spike in the cloud invoice and wants to know which billing month was responsible. Show the month with the highest total cloud spend alongside the total amount.

## Worked solution and explanation

### Why this problem exists in real interviews

This cloud cost problem uses the `cloud_costs` table to evaluate date extraction for time bucketing. Watch how the `bill_date` and `amount` columns interact in the grouping and filtering logic.

---

### Break down the requirements

#### Step 1: Extract month from bill_date

`strftime('%Y-%m', bill_date)` (or just month) normalizes each cost to its billing month.

#### Step 2: Group and sum

`GROUP BY month` with `SUM(amount)` produces total spend per month.

#### Step 3: Select the peak

`ORDER BY total_amount DESC LIMIT 1` returns the single highest-spend month.

---

### The solution

**Extract month from bill_date to find peak spending month**

```sql
SELECT strftime('%Y-%m', bill_date) AS billing_month, SUM(amount) AS total_spend
FROM cloud_costs
GROUP BY billing_month
ORDER BY total_spend DESC
LIMIT 1
```

> **Cost Analysis**
>
> The query scans `cloud_costs` (8,000,000 rows). A covering index on the filter and group columns would reduce I/O. At this scale, the full scan is acceptable but becomes costly if the table grows 10x.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What would happen to your result if `cloud_costs.amount` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `amount` and uses DISTINCT or deduplication where needed.)_
- If `cloud_costs` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `amount`? _(Tests ability to identify performance hotspots related to `cloud_costs.amount` at scale.)_
- If `cloud_costs` contained late-arriving rows that were inserted after your query ran, how would you design an incremental update instead of re-aggregating? _(Tests understanding of incremental aggregation patterns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/peak_spending_month)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.