# Quarterly Consolidated Cloud Costs

> Quarterly cloud spend, weighted.

Canonical URL: <https://datadriven.io/problems/quarterly_consolidated_cloud_costs>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Finance weights each cloud service's raw cost by its matching allocation amount from cost_allocs. For Q1 and Q2 of 2026, compute the total weighted spend per quarter. Return the quarter label (Q1/Q2) and total converted spend.

## Worked solution and explanation

### Why this problem exists in real interviews

The core skill being tested is self-join, applied to the `cloud_costs` and `cost_allocs` tables in a cloud cost context. Getting columns like `svc_name`, `bill_date`, and `amount` right is where most candidates slip. The problem layers in conditional aggregation via CASE, date extraction for time bucketing as well.

---

### Break down the requirements

#### Step 1: Join cloud_costs to cost_allocs on service

`JOIN cost_allocs ON cloud_costs.svc_name = cost_allocs.svc_name` matches costs to allocations.

#### Step 2: Filter to Q1 and Q2

Apply date filters on `bill_date` for the first two quarters.

#### Step 3: Compute weighted spend per quarter

`SUM(cloud_costs.amount * cost_allocs.amount)` grouped by quarter label.

---

### The solution

**Case pivot for quarterly consolidated cloud costs**

```sql
SELECT
    CASE
        WHEN CAST(strftime('%m', cc.bill_date) AS INTEGER) BETWEEN 1 AND 3 THEN 'Q1'
        ELSE 'Q2'
    END AS quarter,
    SUM(cc.amount * ca.amount) AS weighted_spend
FROM cloud_costs cc
JOIN cost_allocs ca
  ON cc.svc_name = ca.svc_name
WHERE strftime('%Y', cc.bill_date) = '2026'
  AND CAST(strftime('%m', cc.bill_date) AS INTEGER) BETWEEN 1 AND 6
GROUP BY quarter
```

> **Cost Analysis**
>
> With `cloud_costs` (12,000,000 rows), `cost_allocs` (15,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What would happen to your result if `cost_allocs.svc_name` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `svc_name` and uses DISTINCT or deduplication where needed.)_
- `cost_allocs.amount` has roughly 2,500,000 distinct values. What index strategy would you use to avoid a full scan on `cost_allocs`? _(Tests indexing knowledge specific to the high-cardinality `amount` column in `cost_allocs`.)_
- If the date column in `cloud_costs` spans multiple years, does your date extraction logic still produce correct time buckets? _(Tests whether the candidate accounts for year boundaries in date bucketing.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/quarterly_consolidated_cloud_costs)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.