# The Response Aggregator

> Multiple result pages. One clean summary.

Canonical URL: <https://datadriven.io/problems/the_response_aggregator>

Domain: Python · Difficulty: medium · Seniority: L3

## Problem

Given pages (a list of pages, each a list of dicts with 'category' and 'amount'), sum amounts per category across all pages. Return a list of [category, total] pairs sorted by total descending, tie-break alphabetically by category.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **flattening paginated data and aggregating by category**, a common ETL pattern when consuming paginated API responses. It checks whether candidates can handle two levels of iteration and produce sorted output.

---

### Break down the requirements

#### Step 1: Flatten all pages into a single stream of records

Iterate through each page, then each record within the page.

#### Step 2: Accumulate totals by category

Use a dictionary to sum amounts per category across all pages.

#### Step 3: Sort by total descending and return

Convert the dict to a list of tuples, sort by total descending, and rebuild as a dict.

---

### The solution

**Page flattening with category aggregation**

```python
def aggregate_pages(pages):
    totals = {}
    for page in pages:
        for record in page:
            cat = record['category']
            amt = record['amount']
            if cat in totals:
                totals[cat] += amt
            else:
                totals[cat] = amt
    pairs = []
    for cat in totals:
        pairs.append((totals[cat], cat))
    pairs.sort(reverse=True)
    result = {}
    for total, cat in pairs:
        result[cat] = total
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n + k log k) where n is the total number of records and k is the number of distinct categories.
> 
> **Space:** O(k) for the totals dictionary.

> **Interviewers Watch For**
>
> Do you iterate through pages cleanly? The nested loop (page, then record) mirrors how paginated API data is typically consumed.

> **Common Pitfall**
>
> Sorting by category name instead of total amount. The problem asks for descending total, so the sort key must be the accumulated amount.

---

## Common follow-up questions

- What if pages were fetched lazily from an API? _(Tests async iteration or generator-based page fetching.)_
- What if some records had missing 'amount' fields? _(Tests using `.get('amount', 0)` for defensive access.)_
- How would you handle this with pandas? _(Tests pd.concat + groupby + sort_values.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_response_aggregator)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.