# Days with More Edited Than Unedited Messages

> Some days, more messages get edited than sent.

Canonical URL: <https://datadriven.io/problems/days_with_more_edited_than_unedited_messages>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Pull all records from days when the number of edited messages exceeded the number of unedited messages. Return all available fields for qualifying rows.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests conditional aggregation with a date-level filter. The requirement to return all original rows for qualifying days probes whether you can use a subquery or CTE to identify dates and then join back.

---

### Break down the requirements

#### Step 1: Identify qualifying days

Group by `sent_at::DATE`, count edited vs unedited. Filter where edited count exceeds unedited.

#### Step 2: Return all rows for those days

Join back to the original table on date to get all fields.

---

### The solution

**Qualifying date filter with full row retrieval**

```sql
WITH daily_counts AS (
    SELECT sent_at::DATE AS day
    FROM chat_msgs
    GROUP BY sent_at::DATE
    HAVING SUM(CASE WHEN edited = 1 THEN 1 ELSE 0 END) > SUM(CASE WHEN edited = 0 THEN 1 ELSE 0 END)
)
SELECT m.*
FROM chat_msgs m
JOIN daily_counts d ON m.sent_at::DATE = d.day
```

> **Cost Analysis**
>
> Two scans of 50M rows: one for the aggregate (producing qualifying dates) and one for the join. An index on `sent_at` would help the date extraction. The join filters to only qualifying days' rows.

> **Interviewers Watch For**
>
> Whether the candidate returns all original rows (as specified by 'all records' and 'all available fields') vs just the aggregated counts. The prompt asks for the raw records, not summaries.

> **Common Pitfall**
>
> Returning only the aggregate counts instead of the raw rows would miss the requirement. The CTE identifies qualifying dates; the main query retrieves all rows for those dates.

---

## Common follow-up questions

- What if edited is a boolean? _(Change the CASE to `WHEN edited = true` or `WHEN edited THEN 1`.)_
- How would you optimize this for repeated queries? _(Materialize the qualifying dates or add an indexed computed column.)_
- What if the definition of 'edited' included messages edited multiple times? _(Tests whether edited is a boolean flag or a count.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/days_with_more_edited_than_unedited_messages)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.