# The Trip Aggregator

> Travel records hold patterns waiting to be surfaced.

Canonical URL: <https://datadriven.io/problems/the_trip_aggregator>

Domain: Python · Difficulty: medium · Seniority: L3

## Problem

Given a list of trip dicts (each with user_id, destination, duration_hours), per user_id return user_id, total_hours = SUM(duration_hours), unique_destinations = count of distinct destinations. Return a list of dicts sorted alphabetically by user_id.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **multi-field aggregation** from structured records. Computing multiple metrics per group (total hours and unique destinations) in a single pass mirrors GROUP BY with multiple aggregate functions in SQL.

---

### Break down the requirements

#### Step 1: Group trips by user_id

Maintain a dict keyed by user_id. Each value tracks total hours and a set of destinations.

#### Step 2: Accumulate hours and destinations per user

For each trip, add duration to the running total and add the destination to the set.

#### Step 3: Format and sort the output

Convert each user's aggregated data to the required dict format and sort by user_id.

---

### The solution

**Dict-based grouping with set for unique counting**

```python
def aggregate_trips(trips: list) -> list:
    users = {}
    for trip in trips:
        uid = trip["user_id"]
        if uid not in users:
            users[uid] = {"total_hours": 0, "destinations": set()}
        users[uid]["total_hours"] += trip["duration_hours"]
        users[uid]["destinations"].add(trip["destination"])
    result = []
    for uid in sorted(users.keys()):
        entry = {
            "user_id": uid,
            "total_hours": users[uid]["total_hours"],
            "unique_destinations": len(users[uid]["destinations"])
        }
        result.append(entry)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(t + u log u) where t is the number of trips and u is the number of unique users. Aggregation is O(t), sorting users is O(u log u).
> 
> **Space:** O(t) in the worst case for destination sets.

> **Interviewers Watch For**
>
> Using a set for destinations to get unique counts is the key insight. Using a list and deduplicating later is messier and slower.

> **Common Pitfall**
>
> Counting all destinations instead of unique ones. Without a set, duplicate destinations inflate the count.

---

## Common follow-up questions

- What if trips had a cost field and you also needed total spend? _(Tests adding another accumulator to the per-user dict.)_
- How would you express this as a SQL query? _(Tests `GROUP BY user_id` with `SUM(duration_hours)` and `COUNT(DISTINCT destination)`.)_
- What if the data was too large for memory? _(Tests external aggregation or database-based grouping.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_trip_aggregator)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.