# Mutual Channel Connections

> Two users. What channels do they share?

Canonical URL: <https://datadriven.io/problems/mutual_channel_connections>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Our messaging platform tracks which senders post in which channels. Find all sender IDs that appear in at least one channel where user 197 posted and also in at least one channel where user 585 posted, excluding those two users themselves.

## Worked solution and explanation

### Why this problem exists in real interviews

Set-membership questions are how interviewers test whether you can decompose 'shared with both' into two independent set-intersection conditions. The phrase 'mutual' tempts candidates into a self-join, but the actual structure is two parallel subquery filters: one for user 197's channels and one for user 585's. Recognizing that the two channels can be different is the key insight.

---

### Break down the requirements

#### Step 1: Decompose 'shares a channel with' into two nested IN subqueries

First, get the set of channels user 197 has posted in: `SELECT channel FROM chat_msgs WHERE sender_id = 197`. Then get the set of senders who posted in any of those channels. Repeat for user 585. The qualifying senders are in both sets at once.

#### Step 2: Intersect the two sender sets with AND

Apply both `sender_id IN (...)` filters with `AND`. Either one alone is too loose, and combining them in a single IN list would change the meaning to 'posted in a channel touched by 197 OR 585' instead of 'AND'.

#### Step 3: Exclude the seed users and dedupe

Add `WHERE sender_id NOT IN (197, 585)` so the two reference users do not appear in their own result. Wrap the projection in `SELECT DISTINCT sender_id` since each qualifying sender likely has many messages and you want one row per person.

---

### The solution

**Two parallel IN subqueries, intersected with AND**

```sql
SELECT DISTINCT sender_id
FROM chat_msgs
WHERE sender_id NOT IN (197, 585)
  AND sender_id IN (SELECT sender_id FROM chat_msgs WHERE channel IN (SELECT channel FROM chat_msgs WHERE sender_id = 197))
  AND sender_id IN (SELECT sender_id FROM chat_msgs WHERE channel IN (SELECT channel FROM chat_msgs WHERE sender_id = 585))
```

> **Cost Analysis**
>
> `chat_msgs` is 35M rows partitioned by `sent_at` over 365 days. Each of the four subqueries scans the partition (or full table without partition pruning) once. With `sender_id` cardinality 2M and `channel` cardinality 180K, the inner IN lists fit easily in a hash set, so each predicate is an O(1) hash probe per row. Wrapping the inner channel lookups in CTEs would let the planner materialize each set once instead of recomputing.

> **Interviewers Watch For**
>
> Interviewers want to see you NOT collapse this into a single self-join on channel. They also watch whether you remember to exclude 197 and 585 themselves (they trivially share every channel with each other), and whether you apply DISTINCT since `chat_msgs` is at the message grain not the user grain.

> **Common Pitfall**
>
> Forgetting `DISTINCT` returns one row per qualifying message, which can be tens of thousands of duplicate `sender_id` values per qualifying user. The query still 'works' but the output blows up and the grader rejects it for cardinality.

---

## Common follow-up questions

- How would you rewrite this using `EXISTS` instead of `IN`, and when would the planner prefer one over the other? _(Tests recognition of EXISTS as the semantically correct rewrite when only existence matters. The two subqueries become correlated EXISTS clauses against `chat_msgs` aliased to the seed user's channel set.)_
- How would the query change if the prompt required the sender to share the SAME channel with both 197 and 585, not two different channels? _(Tests stricter intersection logic. The candidate should mention adding a join or correlated subquery so that the same `channel` value appears in both 197 and 585's channel sets, not 'any channel for each.')_
- If you only cared about overlap in the last 30 days, where would you push the `sent_at` filter and why? _(Tests partition pruning awareness. `chat_msgs` is partitioned by `sent_at`, so adding a `sent_at >= ...` predicate scoped inside each subquery lets the planner skip irrelevant partitions.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/mutual_channel_connections)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.