# The Email Ranker

> Some inboxes see more action.

Canonical URL: <https://datadriven.io/problems/the_email_ranker>

Domain: Python · Difficulty: medium · Seniority: L3

## Problem

Given a list of activity records (each a dict with 'user_id' and 'action'), count activities per user_id. Rank users using dense ranking by activity count descending, with ties broken by user_id alphabetically ascending. Return a list of dicts with keys 'user_id', 'activity_count', 'rank' in rank order.

## Worked solution and explanation

### Why this problem exists in real interviews

Dense ranking with deterministic tie-breakers shows up constantly in leaderboards, attribution, and analytics. Interviewers use this prompt to see whether you can count, sort with a composite key, and assign ranks correctly when multiple users tie.

---

### Break down the requirements

#### Step 1: Count activities per user_id

One pass over `activities` builds `{user_id: count}`. The `action` field is irrelevant; only the count matters for ranking.

#### Step 2: Sort by count desc, then user_id asc

Use a composite key `(-count, user_id)`. Negating the count flips the sort to descending while keeping the secondary sort on `user_id` ascending in a single `sorted` call.

#### Step 3: Assign DENSE rank, not row number

Dense rank means tied users share the same rank and the next distinct count gets `rank + 1` (no gap). Increment `rank` only when the current count differs from the previous count, not on every row.

---

### The solution

**Count, composite sort, dense rank assignment**

```python
def rank_by_email_activity(activities: list[dict]) -> list[dict]:
    counts = {}
    for a in activities:
        uid = a['user_id']
        counts[uid] = counts.get(uid, 0) + 1

    ordered = sorted(counts.items(), key=lambda kv: (-kv[1], kv[0]))

    result = []
    rank = 0
    prev_count = None
    for uid, c in ordered:
        if c != prev_count:
            rank += 1
            prev_count = c
        result.append({'user_id': uid, 'activity_count': c, 'rank': rank})
    return result
```

> **Cost Analysis**
>
> Time: O(N) to count plus O(U log U) to sort, where N is the number of activities and U is the number of distinct users. Space: O(U) for the counts dict and the result list.

> **Interviewers Watch For**
>
> Whether you implement DENSE rank (no gaps on ties) instead of row number or RANK (gaps after ties), whether your tie-breaker on `user_id` is alphabetical ascending, and whether the output dicts have exactly the three keys `user_id`, `activity_count`, `rank`. Strong candidates clarify the ranking flavor before coding.

> **Common Pitfall**
>
> Using `enumerate(ordered, 1)` as the rank. That gives row number, not dense rank: two tied users would receive ranks 1 and 2 instead of both being rank 1. Always increment the rank counter only when the count changes.

---

## Common follow-up questions

- What is the difference between RANK, DENSE_RANK, and ROW_NUMBER, and which one does this prompt ask for? _(Tests SQL-window vocabulary. The prompt asks for DENSE_RANK: ties share the same rank and the next group is rank + 1 with no gap.)_
- How would you change the tie-breaker to most recent activity instead of alphabetical user_id? _(Tests sort key extension. The candidate must carry a `last_seen` timestamp through the count step and sort by `(-count, -last_seen)`.)_
- If `activities` arrived as a streaming feed of millions of events per minute, how would you maintain the leaderboard incrementally? _(Tests online algorithms. Options include a heap of top K, a sorted structure keyed on count, or periodic re-sort of just the changed users.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_email_ranker)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.