# Batch With Metadata

> The list gets chopped.

Canonical URL: <https://datadriven.io/problems/batch_with_metadata>

Domain: Python · Difficulty: easy · Seniority: L3

## Problem

Given a list of records and a batch size n, split into fixed-size chunks. Return a list of dicts, one per chunk, each containing 'batch_index' (0-based), 'records' (the chunk), and 'is_last' (True only for the final chunk). The last chunk may be shorter than n.

## Worked solution and explanation

### Why this problem exists in real interviews

This extends basic batching by requiring **metadata enrichment**: each chunk needs its index and an `is_last` flag. It tests whether you can track position state while iterating, a pattern common in paginated API calls and ETL progress reporting.

---

### Break down the requirements

#### Step 1: Chunk the list into batches of size n

Use range with step n and list slicing, the standard chunking pattern.

#### Step 2: Wrap each batch with its index and is_last flag

Track the batch index as you iterate. The last batch is the one where `i + n` reaches or exceeds the total length.

#### Step 3: Handle empty input

An empty records list should return an empty list of batches, not a single empty batch.

---

### The solution

**Chunking with index and terminal flag**

```python
def batch_with_metadata(records, n):
    result = []
    total = len(records)
    batch_index = 0
    for i in range(0, total, n):
        batch = records[i:i + n]
        is_last = (i + n) >= total
        result.append({
            'batch_index': batch_index,
            'records': batch,
            'is_last': is_last
        })
        batch_index += 1
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the number of records. Each element is visited once during slicing.
> 
> **Space:** O(n) for the output, since every record appears in exactly one batch wrapper.

> **Interviewers Watch For**
>
> Clean separation between the chunking logic and the metadata wrapping. Candidates who conflate these into one tangled loop are harder to trust with production code.

> **Common Pitfall**
>
> Returning a single batch with an empty list when records is empty. The correct behavior is to return an empty list, since there are no batches to report.

---

## Common follow-up questions

- What if the caller also needs a total_batches count in each wrapper? _(Tests whether you can pre-compute the count using ceiling division before iterating.)_
- How would you stream batches to an API with retry logic? _(Tests knowledge of generator-based batching with error handling per batch.)_
- What if records arrive as a stream rather than a list? _(Tests iterator-based approaches where you cannot call len() upfront.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/batch_with_metadata)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.