# Back Again

> Acquisition is expensive. These customers didn't need convincing twice.

Canonical URL: <https://datadriven.io/problems/top_buyers_by_transaction_count>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Rank users by the number of transactions they have completed, using a dense ranking so that users with equal counts share the same rank. Return the username, transaction count, and rank for users in the top 5 tiers.

## Worked solution and explanation

### Why this problem exists in real interviews

They want to see if you pick the right rank function for the spoken requirement. The prompt says `top 5 tiers`, not `top 5 rows`. If you reach for `LIMIT 5` or `ROW_NUMBER`, you've quietly redefined the metric. They're also checking whether you ask about users with zero `transactions` rows before joining.

---

### Break down the requirements

#### Step 1: Confirm the grain out loud

One row per `user_id` after aggregation. Say it: `username` is the label, `COUNT(transaction_id)` is the measure. Ask whether dropped accounts (`account_status` = 'inactive') with zero transactions need to appear. If yes, `LEFT JOIN`.

#### Step 2: Join users to transactions

`INNER JOIN transactions t ON u.user_id = t.user_id`. Inner is correct here since the prompt scopes to users who have completed transactions. Don't count distinct, `transaction_id` is the PK so dupes aren't possible.

#### Step 3: Aggregate per user

`GROUP BY u.username` and `COUNT(t.transaction_id) AS txn_count`. Wrap in a CTE so the rank function reads cleanly downstream. Aggregating before ranking keeps the window's input set tiny.

#### Step 4: Rank with DENSE_RANK

`DENSE_RANK() OVER (ORDER BY txn_count DESC)`. `RANK` would skip numbers after ties, so `top 5 tiers` could return fewer than 5 distinct rank values. `ROW_NUMBER` would arbitrarily break ties and drop legitimate co-leaders.

#### Step 5: Filter to top 5 tiers

Outer `WHERE rnk <= 5`. You can't reference the window alias in the same `SELECT`'s `WHERE`, that's the whole reason for the second CTE.

---

### The solution

**TOP 5 TIERS BY TRANSACTION COUNT**

```sql
WITH user_spend AS (
  SELECT u.username,
         COUNT(t.transaction_id) AS txn_count
  FROM users u
  INNER JOIN transactions t ON u.user_id = t.user_id
  GROUP BY u.username
),
ranked AS (
  SELECT username,
         txn_count,
         DENSE_RANK() OVER (ORDER BY txn_count DESC) AS rnk
  FROM user_spend
)
SELECT username, txn_count, rnk
FROM ranked
WHERE rnk <= 5;
```

> **Cost Analysis**
>
> `transactions` is 200M rows, `users` is 10M. The join hashes `users` and streams `transactions` on `user_id`. Aggregation collapses to at most 10M groups before the window runs over a tiny set. Avoid `ORDER BY` on the outer query, the `rnk <= 5` filter is the only thing standing between you and a 10M row sort.

> **Interviewers Watch For**
>
> Before you write, ask: `should users with zero transactions appear, ranked last?` and `does account_status filter the user pool?`. If they say yes to zeros, switch to `LEFT JOIN` and `COUNT(t.transaction_id)` (not `COUNT(*)`) so nulls don't inflate to 1.

> **Common Pitfall**
>
> Using `RANK()` instead of `DENSE_RANK()`. With ties at rank 1, `RANK` jumps to 3, then 4, then maybe stops at 5 having returned only 3 tiers. `DENSE_RANK` returns 1,2,3,4,5 contiguously, which is what `top 5 tiers` means.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you return only the single top tier, ties included? _(Tests whether they reach for `WHERE rnk = 1` or fumble with `MAX(COUNT(*))` subqueries.)_
- Rewrite this to include users with zero transactions ranked last. _(Probes `LEFT JOIN` plus `COUNT(t.transaction_id)` vs `COUNT(*)` and the null-vs-zero distinction.)_
- Scope to transactions in the last 90 days. What changes? _(Adds a `WHERE t.transaction_date >= ...` predicate and forces a conversation about whether zero-transaction users in the window still count.)_
- Now rank by `SUM(total_amount)` instead of count. Same query shape? _(Checks they swap the measure cleanly and don't reintroduce `COUNT(DISTINCT)` confusion.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_buyers_by_transaction_count)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.