# The Notification Lifecycle

> Sent, opened, ignored. What happened after the alert went out?

Canonical URL: <https://datadriven.io/problems/push_notification_status_pivot>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Our push notification system logs delivery outcomes per user. For registered users only, pivot their notification counts into separate columns for delivered, opened, and failed. Return user ID and the three counts.

## Worked solution and explanation

### Why this problem exists in real interviews

The core skill being tested is self-join, applied to the `push_notifs` and `users` tables in a notification context. Getting columns like `user_id`, `status`, and `opened` right is where most candidates slip. The problem layers in conditional aggregation via CASE as well.

---

### Break down the requirements

#### Step 1: Join to users for registered filter

`JOIN users ON push_notifs.user_id = users.user_id` restricts to registered users.

#### Step 2: Group by user_id

`GROUP BY user_id` produces one row per user.

#### Step 3: Pivot status counts

`SUM(CASE WHEN status = 'delivered' THEN 1 ELSE 0 END)` and similarly for 'opened' and 'failed' produce the three count columns.

---

### The solution

**Case pivot for push notification status pivot**

```sql
SELECT
    pn.user_id,
    SUM(CASE WHEN pn.status = 'delivered' THEN 1 ELSE 0 END) AS delivered_count,
    SUM(CASE WHEN pn.status = 'opened' THEN 1 ELSE 0 END) AS opened_count,
    SUM(CASE WHEN pn.status = 'failed' THEN 1 ELSE 0 END) AS failed_count
FROM push_notifs pn
JOIN users u ON pn.user_id = u.user_id
GROUP BY pn.user_id
```

> **Cost Analysis**
>
> At `push_notifs` (120,000,000 rows), `users` (15,000,000 rows), a full table scan is expensive. Partition pruning (if the table is partitioned on the filter column) is the first optimization. A covering index on the `GROUP BY` + filter columns eliminates random I/O. Consider a materialized view for repeated dashboard queries.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If `campaign` in `push_notifs` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `push_notifs.campaign`.)_
- With 8,000,000 distinct values in `push_notifs.user_id`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `user_id` affects grouping and sort operations.)_
- If `push_notifs` contained late-arriving rows that were inserted after your query ran, how would you design an incremental update instead of re-aggregating? _(Tests understanding of incremental aggregation patterns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/push_notification_status_pivot)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.