# Word Count Per Message

> How wordy are the messages?

Canonical URL: <https://datadriven.io/problems/word_count_per_message>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

For each chat message, calculate the word count based on spaces in the content. Show the message ID and its word count.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests string manipulation for metrics extraction. Interviewers check whether you can compute a derived metric (word count) from raw text content using SQL string functions.

---

### Break down the requirements

#### Step 1: Compute word count from spaces

Word count based on spaces equals `LENGTH(content) - LENGTH(REPLACE(content, ' ', ")) + 1`. This counts the number of spaces and adds 1.

#### Step 2: Return message ID and word count

Select `msg_id` and the computed word count for each message.

---

### The solution

**String-based word count via space counting**

```sql
SELECT
    msg_id,
    LENGTH(content) - LENGTH(REPLACE(content, ' ', '')) + 1 AS word_count
FROM chat_msgs
```

> **Cost Analysis**
>
> Full scan of 30M rows. LENGTH and REPLACE are O(n) per string where n is the content length. For long messages, this could be significant. No index helps since it is a full-table transformation.

> **Interviewers Watch For**
>
> The `LENGTH - LENGTH(REPLACE(...)) + 1` pattern is a classic SQL trick for counting occurrences of a character. Strong candidates know this pattern immediately.

> **Common Pitfall**
>
> This formula returns 1 for an empty string (since LENGTH(") = 0, REPLACE returns ", and 0 - 0 + 1 = 1). Handle empty content with a CASE expression if needed.

---

## Common follow-up questions

- What if content has multiple consecutive spaces? _(Each extra space inflates the word count by 1. Normalize whitespace first with a recursive REPLACE.)_
- How would you find the average word count per channel? _(GROUP BY channel with AVG of the word count expression.)_
- What if content is NULL for some messages? _(LENGTH(NULL) returns NULL, so the expression returns NULL. Use COALESCE to handle it.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/word_count_per_message)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.