# The Tallest Stacks

> Every terminal has its records. Find the two that stand tallest.

Canonical URL: <https://datadriven.io/problems/the-tallest-stacks>

Domain: Python · Difficulty: medium · Seniority: mid

## Problem

A terminal operations export logs one row per vessel call, each row naming the `terminal` that handled the call and the `teu` (container volume) it moved. For each terminal, report its two highest unique `teu` values, largest first, so a volume hit on several calls still counts only once. A terminal with only a single volume reports just that value, and an empty export reports nothing.

## Worked solution and explanation

### Why this problem exists in real interviews

This is the 'second highest salary' chestnut wearing a shipping manifest, generalized one notch to top-two-distinct per group. The skill being probed: can you compute a per-key runner-up without letting duplicate maxima fool you? Anyone can group the rows and sort them. The trap is that the highest volume usually shows up on more than one vessel call, so the moment you sort a terminal's raw volumes largest-first and slice the top two, you get the same number twice. Get that wrong and Rotterdam reports [1200, 1200] when its real second-place volume was 950.

---

### Break down the requirements

#### Step 1: Bucket the volumes by terminal

One pass over the rows, accumulating each terminal's volumes. Because the very next step needs uniqueness, accumulate straight into a set per terminal rather than a list. defaultdict(set) gives you an empty set on first touch so you can .add() without an existence check.

#### Step 2: Dedup is the whole point, not an afterthought

The spec says a volume hit on several calls counts once. Collapsing to distinct values is what separates the real second-highest from a repeated max. Doing it as you bucket (the set in step 1) means the ordering step never sees a duplicate.

#### Step 3: Take the two largest, and let short groups self-handle

Sort each terminal's distinct values from largest to smallest and slice the first two. The slice is what makes one-value terminals fall out for free: [v][:2] is just [v], no special branch needed. An empty export never enters the loop, so it returns an empty dict naturally.

---

### The solution

**Bucket into sets, sort distinct, slice two**

```python
from collections import defaultdict

def top_two_by_terminal(readings):
    volumes = defaultdict(set)
    for row in readings:
        volumes[row["terminal"]].add(row["teu"])
    return {
        terminal: sorted(distinct, reverse=True)[:2]
        for terminal, distinct in volumes.items()
    }
```

> **Complexity**
>
> Time is O(n) to bucket the n rows into sets, plus O(d log d) per terminal to sort that terminal's d distinct volumes. Across all terminals the sorting is bounded by O(k log k) where k is the total number of distinct values, which is at most n. Space is O(k) for the buckets. A port export is a handful of terminals against thousands of calls, so the distinct sets stay tiny and the sort is effectively free.

> **Interviewers Watch For**
>
> Whether you dedup BEFORE ordering rather than after, and whether you reach for a set during bucketing instead of building lists and calling set() later. Strong candidates also call out that the slice [:2] makes the single-distinct-value case work with no branch, and confirm out loud what should happen when a terminal has only one volume before writing code.

> **Common Pitfall**
>
> sorted(raw_values, reverse=True)[:2] on the un-deduplicated list. When the max volume appears on two calls it comes back twice, so the runner-up is silently dropped. The mirror-image mistake is using max() twice (max, then max of everything below it) and forgetting that the second max() must also exclude every copy of the first value, not just one.

**Naive: sort then slice**

sorted([1200, 1200, 950, 800], reverse=True)[:2] returns [1200, 1200]. The duplicate maximum eats both slots and the true second place, 950, never surfaces.

**Correct: distinct, then slice**

sorted({1200, 950, 800}, reverse=True)[:2] returns [1200, 950]. Collapsing to a set first guarantees the two slots hold two different volumes.

---

## Common follow-up questions

- What changes if you need the top K distinct volumes per terminal instead of two, with K passed in? _(Tests generalizing the slice to [:k] and noticing that heapq.nlargest(k, distinct) is the cleaner tool once K is small relative to the distinct count.)_
- The export now streams in at billions of rows and will not fit in memory. How do you keep this per terminal? _(Tests maintaining a bounded structure per key (a small top-2 tracker or a per-key heap) in a single streaming pass, or a distributed group-by-then-reduce, instead of materializing every value.)_
- How would you return the second-highest only, and what should happen for a terminal with a single distinct volume? _(Tests the explicit absent-runner-up decision: return None, omit the terminal, or raise, and pinning that contract before coding.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the-tallest-stacks)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.