# The High Rollers

> Not every gambler bets the same - some wager far more than others.

Canonical URL: <https://datadriven.io/problems/the_high_rollers>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given a list of wager dicts (each with 'user' and 'amount') and threshold (in 0..100), sum amount per user. Compute the given percentile cutoff of the per-user totals using linear interpolation. Return the sorted list of usernames whose total is strictly greater than that cutoff.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **aggregation combined with percentile filtering**, a common analytics pattern. It probes whether a candidate can compute per-user totals, determine a percentile threshold, and filter accordingly.

---

### Break down the requirements

#### Step 1: Aggregate total spend per user

Sum all wager amounts grouped by username.

#### Step 2: Compute the percentile threshold

Sort the totals, then find the value at the specified percentile position.

#### Step 3: Filter users above the threshold

Return usernames whose total exceeds the percentile value, sorted alphabetically.

---

### The solution

**Aggregate, percentile, and filter**

```python
def high_rollers(wagers: list, percentile: float) -> list:
    totals = {}
    for w in wagers:
        user = w['username']
        if user in totals:
            totals[user] += w['amount']
        else:
            totals[user] = w['amount']
    sorted_totals = sorted(totals.values())
    n = len(sorted_totals)
    idx = int(percentile / 100.0 * (n - 1))
    threshold = sorted_totals[idx]
    result = []
    for user, total in totals.items():
        if total > threshold:
            result.append(user)
    result = sorted(result)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n + k log k) where n is the number of wagers and k is the number of unique users.
> 
> **Space:** O(k) for the totals dict and sorted values.

> **Interviewers Watch For**
>
> How you compute the percentile index. There are multiple conventions (nearest-rank, interpolation). Clarify with the interviewer which method to use.

> **Common Pitfall**
>
> Using `>=` instead of `>` for the threshold filter. The problem asks for users above the percentile, not at or above.

---

## Common follow-up questions

- What percentile method does numpy use by default? _(Tests knowledge of linear interpolation in `np.percentile`.)_
- How would you do this in SQL? _(Tests `PERCENT_RANK()` or `NTILE()` window functions.)_
- What if the percentile changes frequently? _(Tests precomputing sorted totals and using binary search for different thresholds.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_high_rollers)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.