# Same First and Last Reply Target

> They started and ended the month messaging the same person.

Canonical URL: <https://datadriven.io/problems/same_first_and_last_reply_target>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

Something odd turned up in messaging analytics. Find users whose first and last message in a channel on the same day were sent to the same recipient. Show the sender, recipient, and the date.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply custom window frame specification to the `chat_msgs` table, simulating a real user behavior workflow. Pay attention to the `reply_to` column as they drive the aggregation and output.

> **Trick to Solving**
>
> Rolling or sliding window problems require an explicit frame clause. The default frame is rarely what you want.
> 
> 1. Identify the window size from the prompt (e.g., '3-month rolling')
> 2. Use `ROWS BETWEEN N PRECEDING AND CURRENT ROW`
> 3. Partition by the grouping key, order by the time column

---

### Break down the requirements

#### Step 1: Filter out null values

Exclude rows where `reply_to` is NULL. This prevents nulls from polluting aggregations or creating phantom groups.

#### Step 2: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

#### Step 3: Use a subquery to find the reference value

The scalar subquery computes a single value (like the maximum) that the outer query filters against. This avoids a self-join.

---

### The solution

**Sliding-window for same first and last reply**

```sql
SELECT DISTINCT sender_id, reply_to, msg_date
FROM (
  SELECT sender_id, channel, substr(sent_at, 1, 10) AS msg_date, reply_to,
    FIRST_VALUE(reply_to) OVER (PARTITION BY sender_id, channel, substr(sent_at, 1, 10) ORDER BY sent_at ASC) AS first_reply,
    LAST_VALUE(reply_to) OVER (PARTITION BY sender_id, channel, substr(sent_at, 1, 10) ORDER BY sent_at ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_reply
  FROM chat_msgs
  WHERE reply_to IS NOT NULL
)
WHERE first_reply = last_reply
ORDER BY sender_id, msg_date
```

> **Cost Analysis**
>
> With ~40M rows, the window function runs on the reduced set after filtering and grouping. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for how you handle NULL values and whether you account for them in filters and aggregations; whether you explicitly define the window frame or rely on defaults that may not match the requirement; whether you use a subquery or self-join, and can explain the tradeoffs.

> **Common Pitfall**
>
> Forgetting to filter NULLs creates phantom groups or inflated counts. Always check `null_fraction` in the schema before assuming columns are clean.

---

## Common follow-up questions

- If `reply_to` in `chat_msgs` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `chat_msgs.reply_to`.)_
- `chat_msgs.channel` has roughly 200,000 distinct values. What index strategy would you use to avoid a full scan on `chat_msgs`? _(Tests indexing knowledge specific to the high-cardinality `channel` column in `chat_msgs`.)_
- If this query ran as a scheduled job, how would you add monitoring to detect when the result set is suspiciously empty? _(Tests operational awareness around scheduled query jobs.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/same_first_and_last_reply_target)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.