# Messages From Specific Users

> Specific users. What did they say?

Canonical URL: <https://datadriven.io/problems/messages_from_specific_users>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The trust and safety team received a complaint about users 2 and 3. Pull all chat messages where either user sent the message, or where the content mentions their user IDs.

## Worked solution and explanation

### Why this problem exists in real interviews

The chat_msgs table contains channel and content values that must be processed with pattern matching. This appears as a fundamentals check to probe whether you reason about the correct aggregation grain before writing any window or GROUP BY clause.  Getting the column references right on the first try is what interviewers watch for.

---

### Break down the requirements

#### Step 1: Read from `chat_msgs`

The query targets `chat_msgs` with 8 columns. Identify which columns are needed for the output.

#### Step 2: Filter to the target rows

Apply the `LIKE` pattern match in the `WHERE` clause. This narrows the dataset before any grouping or aggregation.

#### Step 3: Return the result set

Select the required columns with any necessary aliasing or formatting.

---

### The solution

**OR-based multi-condition filter**

```sql
SELECT msg_id, channel, sender_id, content, msg_type, sent_at, edited, reply_to
FROM chat_msgs
WHERE sender_id IN (2, 3)
    OR content LIKE '%2%'
    OR content LIKE '%3%'
```

> **Cost Analysis**
>
> The query scans 25M rows from `chat_msgs`.

> **Interviewers Watch For**
>
> Candidates who verbalize their approach before typing, naming the output columns and expected row count, consistently perform better.

> **Common Pitfall**
>
> Placing a filter in `WHERE` instead of `HAVING` (or vice versa) is a common mistake. `WHERE` filters rows before aggregation; `HAVING` filters groups after.

---

## Common follow-up questions

- What happens to your result if chat_msgs.reply_to contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on reply_to.)_
- How would you verify that your aggregation on chat_msgs.msg_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in chat_msgs.msg_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like msg_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/messages_from_specific_users)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.