# Spending Tiers

> High rollers, mid-spenders, and the frugal. Everyone gets a tier.

Canonical URL: <https://datadriven.io/problems/spending_tiers>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The loyalty program wants to bucket every customer into a spending tier. Compute each user's total transaction spending and label them 'high' if the total is above 500, 'medium' if it is between 200 and 500 inclusive, and 'low' if it is below 200. Return the user_id and that tier label.

## Worked solution and explanation

### Why this problem exists in real interviews

This problem targets conditional aggregation via CASE across the `transactions` table. You need to work with the `user_id` and `total_amount` columns to satisfy the requirements.

---

### Break down the requirements

#### Step 1: Isolate the intermediate result in a CTE

The `totals` CTE computes the intermediate aggregation that the outer query builds on. This separation keeps each layer focused on a single task.

#### Step 2: Select the target columns

The SELECT clause picks exactly the columns the prompt asks for. Returning extra columns or missing a required alias would fail the grading check.

---

### The solution

**Case pivot for spending tiers**

```sql
WITH totals AS (
    SELECT user_id, SUM(total_amount) AS total
    FROM transactions
    GROUP BY user_id
)
SELECT user_id, CASE WHEN total > 500 THEN 'high' WHEN total >= 200 THEN 'medium' ELSE 'low' END
FROM totals
```

> **Cost Analysis**
>
> With ~100M rows, the GROUP BY reduces the working set before any downstream operations; CTEs materialize intermediate results, which can be beneficial or costly depending on the engine. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you decompose the problem into named, testable stages rather than nesting everything; whether you use a subquery or self-join, and can explain the tradeoffs.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- What would happen to your result if `transactions.transaction_date` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `transaction_date` and uses DISTINCT or deduplication where needed.)_
- With 5,000,000 distinct values in `transactions.user_id`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `user_id` affects grouping and sort operations.)_
- Your conditional CASE logic assumes the categories are exhaustive. What happens if a row in `transactions` falls into none of the branches? _(Tests awareness of the implicit ELSE NULL in CASE expressions.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/spending_tiers)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.