# Spending by Account Status

> Segment user spending and activity by account status across the platform

Canonical URL: <https://datadriven.io/problems/spending_by_account_status>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Show each account status with its total number of transactions, the count of unique users, and the total revenue. Include users who have not made any transactions. Only show account statuses with at least 5 transactions.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply self-join to the `users` and `transactions` tables, simulating a real revenue analysis workflow. Pay attention to the `account_status` column as they drive the aggregation and output.

---

### Break down the requirements

#### Step 1: Left join to preserve all base rows

A `LEFT JOIN` from `users` ensures every row appears in the output even if there is no match in `transactions`. Missing values become NULL.

#### Step 2: Aggregate by `u.account_status`

`GROUP BY u.account_status` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 3: Filter groups with HAVING

HAVING applies after GROUP BY, filtering out groups that do not meet the threshold. This cannot be done in WHERE because the aggregate has not been computed yet.

#### Step 4: Deduplicate the result with DISTINCT

`SELECT DISTINCT` removes duplicate rows from the output. This is necessary when joins or subqueries can produce repeated combinations.

#### Step 5: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Having filter for spending by account status**

```sql
SELECT u.account_status, COUNT(t.transaction_id) AS transaction_count, COUNT(DISTINCT u.user_id) AS user_count, SUM(t.total_amount) AS total_revenue
FROM users u
LEFT
JOIN transactions t ON u.user_id = t.user_id
GROUP BY u.account_status
HAVING COUNT(t.transaction_id) >= 5
ORDER BY total_revenue DESC
```

> **Cost Analysis**
>
> With ~200 rows, the GROUP BY reduces the working set before any downstream operations; the join cost depends on the smaller table's cardinality. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you use HAVING (not WHERE) to filter after aggregation; whether you choose the correct join type to avoid silently dropping rows; whether you know when DISTINCT is needed and when it masks a logic error.

> **Common Pitfall**
>
> Putting the aggregate condition in WHERE instead of HAVING causes a syntax error. WHERE runs before GROUP BY; HAVING runs after.

---

## Common follow-up questions

- The `email` column in `users` has a 5% null rate. How does your query handle rows where `email` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `users.email` and understands how aggregates skip NULL values.)_
- The `account_status` column in `users` is heavily skewed toward a few popular values. How would data skew affect parallel execution of your query? _(Tests understanding of skew in `users.account_status` and its impact on distributed query performance.)_
- If `users` contained late-arriving rows that were inserted after your query ran, how would you design an incremental update instead of re-aggregating? _(Tests understanding of incremental aggregation patterns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/spending_by_account_status)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.