# The Category Ranker

> Categories have standing. Rows get theirs.

Canonical URL: <https://datadriven.io/problems/the_category_ranker>

Domain: Python · Difficulty: medium · Seniority: L5

## Problem

Given a list of records (dicts each with 'category' and 'value') and integer n, return a dict mapping each category to the top n records by value descending. Break ties by preserving original insertion order. If a category has fewer than n records, include all of them.

## Worked solution and explanation

### Why this problem exists in real interviews

Top-N per group is a common analytics query (SQL's `ROW_NUMBER() OVER (PARTITION BY ...)`) implemented in Python. It tests **grouping**, **sorting within groups**, and **stable ordering** for tiebreaking.

---

### Break down the requirements

#### Step 1: Group records by category

Partition the records into groups by their category field.

#### Step 2: Sort each group by value descending

Within each category, rank records by their score/value.

#### Step 3: Take the top N per group

Slice each sorted group to keep only the first N records.

#### Step 4: Preserve insertion order for ties

Use a stable sort so equal-valued records maintain their original order.

---

### The solution

**Group, sort descending, and slice top-N per category**

```python
def top_n_per_category(records, category_key, value_key, n):
    groups = {}
    for record in records:
        cat = record[category_key]
        if cat not in groups:
            groups[cat] = []
        groups[cat].append(record)
    result = {}
    for cat in groups:
        sorted_group = sorted(groups[cat], key=lambda r: r[value_key], reverse=True)
        result[cat] = sorted_group[:n]
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n log n) in the worst case when all records are in one category. Per-group sorting is O(g log g) for each group of size g.
> 
> **Space:** O(n) for the grouped records.

> **Interviewers Watch For**
>
> Using Python's stable sort (`sorted()`) to preserve insertion order for ties. Also, parameterizing the function with key names rather than hardcoding field names.

> **Common Pitfall**
>
> Using a heap to find top-N. While asymptotically better, it does not preserve insertion order for ties, which the prompt requires.

---

## Common follow-up questions

- How would you do this in SQL? _(Tests `ROW_NUMBER() OVER (PARTITION BY category ORDER BY value DESC)` with a WHERE clause on the rank.)_
- What if N is very large relative to the group size? _(Tests that slicing beyond the group size just returns all elements, no error.)_
- What if you needed dense ranking instead of top-N? _(Tests assigning ranks where ties share the same rank number.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_category_ranker)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.