# The Log Pulse

> Some lines repeat themselves.

Canonical URL: <https://datadriven.io/problems/the_log_pulse>

Domain: Python · Difficulty: easy · Seniority: L3

## Problem

Given a list of log lines in the format '[LEVEL] message', return a 2-tuple (counts_dict, most_common_level). counts_dict maps each LEVEL to its count. most_common_level is the LEVEL with the highest count (if tied, any tied level is acceptable).

## Worked solution and explanation

### Why this problem exists in real interviews

Log triage is the canonical 'parse a string, count something, pick the winner' interview prompt. It tests whether you reach for the right standard-library tool (collections.Counter), whether you parse defensively, and whether you can specify a tie-break rule out loud before code locks one in by accident.

---

### Break down the requirements

#### Step 1: Extract the level token from each line

Each line looks like '[LEVEL] message'. Splitting on ']' once and stripping the leading '[' is robust to messages that contain brackets later. Skip malformed lines that do not start with '[' rather than crashing the whole batch.

#### Step 2: Tally with Counter

collections.Counter is built for this: O(n) construction, O(1) lookup, and a most_common helper. It also returns 0 for missing keys when you index it, which keeps downstream code clean.

#### Step 3: Pick the most common level deterministically

Counter.most_common(1) returns the highest-count item, but ties resolve by insertion order. State the tie-break rule explicitly. The spec here is silent on ties, so insertion-order (first-seen wins) is reasonable; mention it before the interviewer asks.

---

### The solution

**Counter plus most_common does the work**

```python
from collections import Counter

def count_log_levels(log_lines: list[str]) -> tuple[dict[str, int], str]:
    counts = Counter()
    for line in log_lines:
        if not line.startswith('['):
            continue
        end = line.find(']')
        if end == -1:
            continue
        level = line[1:end]
        counts[level] += 1
    if not counts:
        return {}, ''
    most_common_level = counts.most_common(1)[0][0]
    return dict(counts), most_common_level
```

> **Cost Analysis**
>
> Time is O(n * k) where n is the number of log lines and k is the average line length scanned for the closing bracket. Space is O(u) for u unique levels, typically tiny (DEBUG, INFO, WARN, ERROR, FATAL). Counter is implemented in C, so the constant factor is excellent.

> **Interviewers Watch For**
>
> Whether you use Counter instead of a hand-rolled dict with default-zero, whether you state the tie-break rule out loud, and whether you handle empty input. Strong candidates also flag the malformed-line decision (skip vs raise) as a product question, not a coding one.

> **Common Pitfall**
>
> Splitting on ' ' (space) and taking index 0, which works on the happy path but breaks on '[INFO] [retry] message' or any level with spacing variations. Splitting on ']' once or finding the first ']' anchors on the structure of the prefix, not whitespace luck. Returning the Counter directly instead of a plain dict can also fail equality checks in tests.

---

## Common follow-up questions

- How would you tie-break alphabetically instead of by insertion order? _(show min/max with key=lambda kv: (-kv[1], kv[0]) over counts.items().)_
- What changes if the log format includes a timestamp prefix before the bracket? _(discuss a regex anchored to '\[(\w+)\]' or stripping a fixed-width timestamp first.)_
- How would you scale this to a 1 TB log file? _(stream line by line, keep the Counter in memory (still tiny), or shard by hash of level for parallel workers.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_log_pulse)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.