# Symmetric Reply Network

> Who replies to whom? Both directions.

Canonical URL: <https://datadriven.io/problems/symmetric_reply_network>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The chat_messages table has sender and reply_to columns. Produce a symmetric view of the reply network: if user A replied to user B, the output should include both (A, B) and (B, A). Show each pair of user IDs.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests whether a candidate can demonstrate handling null values correctly. This pattern appears frequently in mid-level SQL rounds where interviewers want to see structured thinking.

---

### Break down the requirements

#### Step 1: Filter out null values

Exclude rows where `reply_to` or `reply_to` is NULL. This prevents nulls from polluting aggregations or creating phantom groups.

#### Step 2: Select the target columns

The SELECT clause picks exactly the columns the prompt asks for. Returning extra columns or missing a required alias would fail the grading check.

---

### The solution

**Union forward and reversed sender/reply_to pairs from chat_msgs**

```sql
SELECT sender_id AS user_a, reply_to AS user_b
FROM chat_msgs
WHERE reply_to IS NOT NULL UNION
SELECT reply_to AS user_a, sender_id AS user_b
FROM chat_msgs
WHERE reply_to IS NOT NULL
```

> **Cost Analysis**
>
> With ~35M rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for how you handle NULL values and whether you account for them in filters and aggregations.

> **Common Pitfall**
>
> Forgetting to filter NULLs creates phantom groups or inflated counts. Always check `null_fraction` in the schema before assuming columns are clean.

---

## Common follow-up questions

- Should you use UNION or UNION ALL here, and what changes if the prompt says 'include duplicates'? _(Tests understanding of implicit deduplication in UNION vs. UNION ALL.)_
- How do you handle rows where reply_to is NULL (original messages with no parent)? _(Tests edge-case filtering; NULL reply_to rows should not generate pairs.)_
- If the prompt asked for the count of symmetric pairs per user, how would you layer aggregation on top of the union? _(Tests CTE or subquery wrapping to aggregate over a derived set.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/symmetric_reply_network)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.