# Sequential Word Pairs

> Everything has a neighbor.

Canonical URL: <https://datadriven.io/problems/sequential_word_pairs>

Domain: Python · Difficulty: easy · Seniority: L3

## Problem

Given a text string, split on whitespace and return a list of consecutive 2-word pairs as [word_i, word_{i+1}]. Return empty list if fewer than 2 words.

## Worked solution and explanation

### Why this problem exists in real interviews

Generating bigrams tests **sliding window** logic on a tokenized string. It is a fundamental NLP preprocessing step that checks whether you correctly handle boundary conditions.

---

### Break down the requirements

#### Step 1: Split the string into words

Use `str.split()` to tokenize by whitespace.

#### Step 2: Pair consecutive words

For each index i from 0 to len-2, create a tuple of `(words[i], words[i+1])`.

#### Step 3: Handle short inputs

If the string has fewer than 2 words, return an empty list.

---

### The solution

**Sliding window of size 2 over tokenized words**

```python
def find_bigrams(text):
    words = text.split()
    result = []
    for i in range(len(words) - 1):
        pair = (words[i], words[i + 1])
        result.append(pair)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) where n is the number of words. Each word is visited once.
> 
> **Space:** O(n) for the result list of bigram tuples.

> **Interviewers Watch For**
>
> Using `range(len(words) - 1)` to avoid an off-by-one error. The number of bigrams is always one less than the number of words.

> **Common Pitfall**
>
> Using `range(len(words))` which causes an IndexError on the last iteration when accessing `words[i + 1]`.

---

## Common follow-up questions

- How would you generalize to n-grams? _(Tests parameterizing the window size: `range(len(words) - n + 1)` with slicing.)_
- What if you needed unique bigrams with counts? _(Tests using a Counter dict keyed by the bigram tuple.)_
- How would you handle punctuation attached to words? _(Tests stripping punctuation before tokenizing.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/sequential_word_pairs)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.