# Top Percentile Spenders

> Top 1% of users by total spend via percentile bucketing.

Canonical URL: <https://datadriven.io/problems/top_percentile_spenders>

Domain: SQL · Difficulty: medium · Seniority: L7

## Problem

Return the user IDs and total spend for customers who fall in the top 1% by total transaction amount over the last 7 days.

## Worked solution and explanation

### What this is really asking

`NTILE(100)` over the per-user SUM(total_amount), filter to bucket 1. The 150M-row table is partitioned by `transaction_date`, so the 7-day filter is the only thing keeping this query alive.

---

### Break down the requirements

#### Step 1: Aggregate first, then rank

Sum `total_amount` per `user_id` inside the last 7 days. Ranking input is one row per user.

#### Step 2: Bucket users into 100 groups

`NTILE(100) OVER (ORDER BY SUM(total_amount) DESC)` splits users into percentiles; bucket 1 is the top.

#### Step 3: Filter outside the window

NTILE cannot appear in WHERE at the same level it is defined. Wrap in a subquery, filter `pctl = 1` outside.

---

**TOP 1% SPENDERS BY NTILE**

```sql
SELECT user_id, total_spend
FROM (
  SELECT
    user_id,
    SUM(total_amount) AS total_spend,
    NTILE(100) OVER (ORDER BY SUM(total_amount) DESC) AS pctl
  FROM transactions
  WHERE transaction_date >= DATE('now', '-7 days')
  GROUP BY user_id
) ranked
WHERE pctl = 1
```

> **Cost Analysis**
>
> Partition pruning on `transaction_date` is load-bearing; 150M rows collapse to ~7 partitions before the GROUP BY. NTILE then sorts distinct users, not raw transactions.

> **Interviewers Watch For**
>
> Pruning before aggregation, awareness that NTILE distributes by rank position (ties land in adjacent buckets), and wrapping the window before filtering on it.

> **Common Pitfall**
>
> `PERCENT_RANK() <= 0.01` looks equivalent but gates on a continuous value, so a small user pool can return zero rows. NTILE always returns one bucket worth.

> **The False Start**
>
> First instinct is `NTILE(100) OVER (ORDER BY total_amount DESC)` on raw rows. That ranks transactions, so one whale with a single big order crowds out a steady spender. Pivot to ranking after `GROUP BY user_id`.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you return the top 1% by row count, not NTILE bucket? _(NTILE rounds bucket sizes; `ROW_NUMBER` plus `COUNT(*) * 0.01` gives an exact slice.)_
- What if two users tie at the percentile boundary? _(NTILE breaks ties arbitrarily; switch to `CUME_DIST()` if ties must stay together.)_
- How would you make this cheap to run hourly? _(Pre-aggregate to a `user_spend_7d` rollup so the window runs over thousands of users.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_percentile_spenders)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.