# Left Join

> Keep the left side. Match what you can.

Canonical URL: <https://datadriven.io/problems/left_join>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given two lists of dicts (left, right) and a shared key name, left-join the lists: for each left record, find the matching right record (the first one whose key matches) and merge its fields into the left record. If no right-side match exists, keep the left record as-is with no extra keys. Return the list.

## Worked solution and explanation

### Why this problem exists in real interviews

Implementing a left join in Python tests **hash-based lookup** and **record merging**. It mirrors the most common SQL join type and reveals whether you build an index for the right side or use nested loops.

---

### Break down the requirements

#### Step 1: Index the right list by the join key

Build a dict mapping each right record's key value to the full record for O(1) lookup.

#### Step 2: Iterate through left records and merge

For each left record, look up the matching right record by key. If found, merge the fields.

#### Step 3: Omit right-side fields when no match exists

If no right record matches, include only the left record's fields.

---

### The solution

**Hash-indexed left join with dict merge**

```python
def left_join(left, right, key):
    right_index = {}
    for record in right:
        right_index[record[key]] = record
    result = []
    for left_record in left:
        joined = {}
        for k in left_record:
            joined[k] = left_record[k]
        match = right_index.get(left_record[key])
        if match is not None:
            for k in match:
                if k != key:
                    joined[k] = match[k]
        result.append(joined)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n + m) where n is the left size and m is the right size. Building the index is O(m), and the join pass is O(n).
> 
> **Space:** O(n + m) for the index and the result.

> **Interviewers Watch For**
>
> Building a hash index on the right side before joining. Candidates who use nested loops get O(n*m) time, which is a red flag for data engineering roles.

> **Common Pitfall**
>
> Adding None-valued keys for unmatched right fields. The prompt says to omit those fields entirely, not set them to None.

---

## Common follow-up questions

- How would you handle duplicate keys on the right side? _(Tests whether you keep the last match, the first, or produce multiple output rows per left record.)_
- How would you implement an inner join instead? _(Tests adding a filter: only include left records that have a right match.)_
- What if both sides have millions of records? _(Tests knowledge of sort-merge join as an alternative when hash tables exceed memory.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/left_join)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.