# The Pay Ladder

> Climb the ladder the hard way. No shortcuts allowed.

Canonical URL: <https://datadriven.io/problems/the_pay_ladder>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given a list of employee dicts (each with a 'salary' key) and integer n, return the top n distinct salaries in descending order. Do NOT use sorted(), sort(), nlargest, or any similar sort helpers - implement the selection yourself. If fewer than n distinct salaries exist, return all of them.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **selection without sorting built-ins**, forcing candidates to implement their own top-N extraction. It probes understanding of heap-based selection or manual iteration as alternatives to sorting.

> **Trick to Solving**
>
> Use a min-heap of size N. As you scan salaries, push each unique value. When the heap exceeds size N, pop the smallest. At the end, the heap contains the top N values. Extract and reverse for descending order.

---

### Break down the requirements

#### Step 1: Extract unique salary values

Convert the salary field of each record into a set to deduplicate.

#### Step 2: Find the top N using a min-heap

Maintain a heap of size N. For each unique salary, push it. If the heap exceeds N, pop the minimum.

#### Step 3: Return in descending order

Extract all values from the heap, sort them descending, and return.

---

### The solution

**Min-heap bounded at size N for top selection**

```python
import heapq
def top_n_salaries(employees, n):
    unique_salaries = set()
    for emp in employees:
        unique_salaries.add(emp['salary'])
    heap = []
    for salary in unique_salaries:
        heapq.heappush(heap, salary)
        if len(heap) > n:
            heapq.heappop(heap)
    result = []
    while heap:
        result.append(heapq.heappop(heap))
    result.reverse()
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(k log N) where k is the number of unique salaries. Each heap operation is O(log N).
> 
> **Space:** O(N) for the heap plus O(k) for the unique salary set.

> **Interviewers Watch For**
>
> Do you use a heap instead of manually tracking N maximums? The heap approach is clean and generalizes to any N, while manual tracking becomes unwieldy for N greater than 2.

> **Common Pitfall**
>
> Forgetting to deduplicate salaries before selection. Without deduplication, a salary appearing 100 times could fill the heap with duplicates.

---

## Common follow-up questions

- What if N is very close to the total number of unique salaries? _(Tests that the heap approach still works but a full sort might be simpler.)_
- How would you also return the employee names associated with each salary? _(Tests maintaining a mapping from salary to employee names alongside the heap.)_
- What if the data were streaming from an API? _(Tests that the heap approach naturally supports streaming since each element is processed once.)_
- What is the time complexity of using sorted() compared to the heap? _(Tests O(k log k) for full sort vs O(k log N) for heap-based selection.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_pay_ladder)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.