# Push Notification Open Rate

> Push sent. How many opened?

Canonical URL: <https://datadriven.io/problems/push_notification_open_rate>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For each date a push notification was sent, calculate the open rate: notifications with opened = true divided by total notifications sent that day. Only include send dates where at least one notification was eventually opened, regardless of when the open occurred.

## Worked solution and explanation

### Why this problem exists in real interviews

Built around the `push_notifs` table, this challenge probes your ability to apply HAVING for post-aggregation filtering in a notification setting. Correctly referencing the `sent_at` and `opened` columns is essential to a working solution.

---

### Break down the requirements

#### Step 1: Extract date from sent_at

`DATE(sent_at)` normalizes each notification to its send date.

#### Step 2: Group by date and compute open rate

`GROUP BY send_date` with `SUM(opened) * 1.0 / COUNT(*)` gives the daily open rate.

#### Step 3: Filter to dates with at least one open

`HAVING SUM(opened) > 0` excludes days with zero opens.

---

### The solution

**Having filter for push notification open rate**

```sql
SELECT
    DATE(sent_at) AS send_date,
    SUM(opened) * 1.0 / COUNT(*) AS open_rate
FROM push_notifs
GROUP BY DATE(sent_at)
HAVING SUM(opened) > 0
```

> **Cost Analysis**
>
> At `push_notifs` (100,000,000 rows), a full table scan is expensive. Partition pruning (if the table is partitioned on the filter column) is the first optimization. A covering index on the `GROUP BY` + filter columns eliminates random I/O. Consider a materialized view for repeated dashboard queries.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If `campaign` in `push_notifs` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `push_notifs.campaign`.)_
- `push_notifs.user_id` has roughly 6,000,000 distinct values. What index strategy would you use to avoid a full scan on `push_notifs`? _(Tests indexing knowledge specific to the high-cardinality `user_id` column in `push_notifs`.)_
- If the HAVING threshold in your query changed from a fixed number to a percentile, how would you restructure the query? _(Tests ability to replace static HAVING filters with dynamic subquery-based thresholds.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/push_notification_open_rate)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.