# The Column Transformer

> Each column gets its function.

Canonical URL: <https://datadriven.io/problems/the_column_transformer>

Domain: Python · Difficulty: easy · Seniority: L3

## Problem

Given rows (list of dicts) and transforms (dict mapping column name to a string Python lambda), apply each transform to every row's corresponding column (in place). Return the list of transformed rows. Use eval to evaluate the lambda source strings.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **higher-order function application** in a data processing context. It probes whether a candidate can iterate rows, apply callable transforms from a dict, and leave untouched columns intact. This pattern is fundamental to ETL and feature engineering pipelines.

---

### Break down the requirements

#### Step 1: Iterate through each row

Process rows one at a time, modifying columns that have transforms.

#### Step 2: For each column in the transforms dict, apply the function

Check if the column exists in the row, then overwrite its value with the transform result.

#### Step 3: Return the modified rows

Columns not in the transforms dict remain unchanged.

---

### The solution

**Row-wise transform application**

```python
def apply_transforms(rows: list, transforms: dict) -> list:
    result = []
    for row in rows:
        new_row = dict(row)
        for col, fn in transforms.items():
            if col in new_row:
                new_row[col] = fn(new_row[col])
        result.append(new_row)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n * t) where n is the number of rows and t is the number of transforms. Each transform application is O(1) assuming the transform function is O(1).
> 
> **Space:** O(n * c) where c is the number of columns per row, for the copied output.

> **Interviewers Watch For**
>
> Whether you copy each row before modifying it. Mutating the input dicts in place is a common source of bugs in real pipelines. Using `dict(row)` creates a shallow copy.

> **Common Pitfall**
>
> Applying transforms to columns that do not exist in a given row. Always check `if col in new_row` before applying the function to avoid `KeyError`.

---

## Common follow-up questions

- What if transforms could raise exceptions? _(Tests error handling: wrapping each transform in try/except to skip or log failures.)_
- How would you apply transforms lazily for a large dataset? _(Tests generator-based approach: yield each transformed row instead of collecting all.)_
- What if a transform depends on multiple columns? _(Tests API design: the transform function would need the entire row, not just one value.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_column_transformer)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.