# Follow the Money

> Every dollar leaves a trail. Find the accounts moving the most of it.

Canonical URL: <https://datadriven.io/problems/follow-the-money-yearly-account-volume>

Domain: SQL · Difficulty: easy · Seniority: mid

## Problem

The account analytics team is closing out this year's books and wants a read on who is moving the most money through the platform. Add up each account's transaction amounts for 2026 and list them from the biggest total down.

## Worked solution and explanation

### What this is really asking

Under the banking costume this is a one-table rollup: collapse a year of transactions to one row per account and sum the money. It looks trivial, and that is exactly why interviewers use it as a screen. Two small decisions separate a clean answer from a wrong one: scoping the date filter to 2026 on a date column without breaking on string formats, and summing amounts per account instead of counting rows. Get the grain wrong and you report transaction counts dressed up as dollars.

---

### Break down the requirements

#### Step 1: Scope to this year

The ask is 2026 only, and the table spans several years. strftime('%Y', transaction_date) = '2026' pulls the year out of each date and compares it as text, which is exact and avoids off-by-one boundary mistakes around January 1.

#### Step 2: Roll up to the account grain

GROUP BY user_id collapses many transactions per account into one row. This is the unit the team cares about: the account, not the individual transaction.

#### Step 3: Sum the money, then sort

SUM(total_amount) is the metric, not COUNT(*). Name the column total_spent so the output reads as dollars, then ORDER BY it descending so the heaviest accounts sit at the top. A user_id tiebreak keeps the ordering stable when two accounts tie.

---

### The solution

**Yearly volume per account**

```sql
SELECT user_id,
       SUM(total_amount) AS total_spent
FROM transactions
WHERE strftime('%Y', transaction_date) = '2026'
GROUP BY user_id
ORDER BY total_spent DESC, user_id
```

> **Cost Analysis**
>
> On a real ledger this table is hundreds of millions of rows partitioned by transaction_date. The catch: wrapping the column in strftime() defeats partition pruning and any index on transaction_date, forcing a full scan. In production rewrite the filter as a sargable range, transaction_date >= '2026-01-01' AND transaction_date < '2026-12-31' plus a day, so the planner range-seeks one year instead of scanning all of history. The aggregate itself is a single cheap hash-group pass.

> **Interviewers Watch For**
>
> Whether you say out loud that the metric is SUM(total_amount), not a row count, and whether you notice the year filter is the only thing standing between this answer and an all-time total. Strong candidates also flag the strftime-versus-range tradeoff before being asked.

> **Common Pitfall**
>
> Dropping the GROUP BY and reporting raw transaction rows, or quietly omitting the year filter and summing every year at once. Both produce a plausible-looking number that is simply the wrong question answered.

---

## Common follow-up questions

- Show only the top 10 accounts by yearly volume. _(Tests LIMIT after the sort and whether they reason about ties at the cutoff.)_
- Add a column for each account's average transaction size alongside the total. _(Probes mixing AVG(total_amount) with SUM in the same grouped query.)_
- Make this run incrementally each night instead of scanning the whole year. _(Pushes toward the sargable date range and partition pruning on transaction_date.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/follow-the-money-yearly-account-volume)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.