# Unique Reporters per Content

> How many people flagged each item?

Canonical URL: <https://datadriven.io/problems/unique_reporters_per_content>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

In the chat data, each message can be a reply to content. For each content reference, count unique reporters identified by combining sender and channel. Only count messages that are replies (reply_to is not null). Return the content ID and the reporter count.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests compound uniqueness logic. Interviewers check whether you can define a unique reporter as a (sender, channel) pair and apply NULL filtering before counting.

---

### Break down the requirements

#### Step 1: Filter to replies only

`WHERE reply_to IS NOT NULL` restricts to messages that are replies (30% of 35M rows, since reply_to has 70% null fraction).

#### Step 2: Group by content reference

`GROUP BY reply_to` treats each reply_to value as the content ID being reported on.

#### Step 3: Count unique reporters

`COUNT(DISTINCT sender_id || '-' || channel)` counts unique (sender, channel) combinations as distinct reporters.

---

### The solution

**Compound distinct count with NULL filter**

```sql
SELECT
    reply_to AS content_id,
    COUNT(DISTINCT sender_id || '-' || channel) AS reporter_count
FROM chat_msgs
WHERE reply_to IS NOT NULL
GROUP BY reply_to
```

> **Cost Analysis**
>
> After the NULL filter, ~10.5M rows remain. The string concatenation for the compound distinct key adds overhead. An alternative is `COUNT(DISTINCT sender_id * 1000000 + channel_hash)` if both are numeric.

> **Interviewers Watch For**
>
> Whether the candidate correctly identifies that uniqueness is defined by the (sender, channel) pair, not just sender_id alone. Different channels for the same sender count as different reporters.

> **Common Pitfall**
>
> Using `COUNT(DISTINCT sender_id)` alone. This under-counts when the same sender reports from multiple channels, each of which should count as a separate reporter.

---

## Common follow-up questions

- What if the concatenation creates false collisions (e.g., sender '1' + channel '23' = sender '12' + channel '3')? _(Use a separator that cannot appear in either field, or use a tuple-based approach.)_
- How would you limit to content with more than 5 reporters? _(Add HAVING COUNT(DISTINCT ...) > 5 after the GROUP BY.)_
- What if you also needed the list of reporter IDs? _(Use GROUP_CONCAT(DISTINCT sender_id) alongside the count.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/unique_reporters_per_content)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.