# The Line Splitter

> Comma-separated truths, one at a time.

Canonical URL: <https://datadriven.io/problems/the_line_splitter>

Domain: Python · Difficulty: easy · Seniority: L4

## Problem

Given a list of CSV-formatted strings where the first string is the header row (comma-separated), return a list of dicts - one per data row - mapping header names to values (all strings). Do not parse quoted fields; simple split on ','.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **string parsing and dict construction**, two skills central to ETL work. Interviewers want to see if you can parse structured text into Python objects without reaching for a CSV library.

---

### Break down the requirements

#### Step 1: Extract the header row

Split the first string on commas to get column names. These become the dictionary keys for every subsequent row.

#### Step 2: Parse each data row into a dict

For each remaining string, split on commas and zip the values with the header names to build a dictionary.

#### Step 3: Return all row dicts in order

Collect each dictionary into a result list, preserving the original row order.

---

### The solution

**Header extraction with zip-based row mapping**

```python
def parse_csv(lines):
    headers = lines[0].split(',')
    result = []
    for i in range(1, len(lines)):
        values = lines[i].split(',')
        row = {}
        for j in range(len(headers)):
            row[headers[j]] = values[j]
        result.append(row)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n * m) where n is the number of rows and m is the number of columns. Each cell is visited once.
> 
> **Space:** O(n * m) for the output list of dictionaries.

> **Interviewers Watch For**
>
> Do you handle the header row separately and cleanly? Candidates who try to process all rows uniformly and then special-case the first one often produce messier code.

> **Common Pitfall**
>
> Assuming values need type conversion. The problem explicitly states all values remain as strings, so casting to int or float would be incorrect.

---

## Common follow-up questions

- What if fields could contain commas inside quoted strings? _(Tests awareness of CSV edge cases and quoting rules.)_
- How would you handle rows with fewer columns than the header? _(Tests defensive coding: pad with empty strings or raise an error.)_
- What if the input could be millions of lines? _(Tests whether you would use a generator instead of building the full list in memory.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_line_splitter)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.