# Custom Message Type Counts

> Not all messages are created equal.

Canonical URL: <https://datadriven.io/problems/custom_message_type_counts>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The trust and safety team is tracking custom message types sent to users. Count how many times each user was on the receiving end of a message with a non-standard type (anything other than 'text' or 'image'). Show each recipient, message type, and count.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests NOT IN filtering combined with GROUP BY. The challenge is identifying non-standard message types and counting them per recipient, which requires understanding the `reply_to` field as the recipient indicator.

---

### Break down the requirements

#### Step 1: Filter to non-standard message types

`WHERE msg_type NOT IN ('text', 'image')` removes standard types.

#### Step 2: Identify recipients

The `reply_to` field or channel membership determines recipients. Since the schema has `sender_id` and `reply_to`, the recipient context depends on the data model.

#### Step 3: Group and count

`GROUP BY sender_id, msg_type` (or reply_to as recipient) with `COUNT(*)`.

---

### The solution

**Filtered aggregation by recipient and type**

```sql
SELECT
    reply_to AS recipient_id,
    msg_type,
    COUNT(*) AS msg_count
FROM chat_msgs
WHERE msg_type NOT IN ('text', 'image')
  AND reply_to IS NOT NULL
GROUP BY reply_to, msg_type
```

> **Cost Analysis**
>
> Scan of 40M rows with NOT IN filter. The GROUP BY reduces to (recipients x custom types). The NOT IN check is O(1) per row since the exclusion list is small.

> **Interviewers Watch For**
>
> How the candidate interprets "receiving end." In this schema, `reply_to` is the most likely indicator of message recipient. Strong candidates ask for clarification.

> **Common Pitfall**
>
> Using `sender_id` as the recipient would invert the logic: you would count messages *sent* by a user, not received. Read the schema carefully to identify the recipient column.

---

## Common follow-up questions

- What if reply_to is NULL for non-reply messages? _(Those messages have no explicit recipient; filter them out or use channel membership.)_
- How would you also include the message content for context? _(Requires removing the GROUP BY or using STRING_AGG for concatenation.)_
- What if the exclusion list were dynamic? _(Tests subquery-based NOT IN: WHERE msg_type NOT IN (SELECT type FROM standard_types).)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/custom_message_type_counts)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.