# Active User Open Rate

> What share of push notifications were opened by active users

Canonical URL: <https://datadriven.io/problems/active_user_open_rate>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The engagement team is measuring notification effectiveness for the quarterly product review. Compute the percentage of all push notifications that were both sent to a user with an active account and opened by the recipient. Notifications should still be counted in the denominator even if the user's account record no longer exists.

## Worked solution and explanation

### Why this problem exists in real interviews

The core skill being tested is conditional branching with CASE combined with division-safe NULLIF guards over `push_notifs`, `users`. Candidates must decide how `title`, `platform`, `status` interact before choosing a join strategy or aggregation level.

> **Trick to Solving**
>
> Any rate or ratio problem requires **null-safe division**. If the denominator can be zero, the query crashes or returns NULL silently.
> 
> 1. Identify the numerator and denominator conditions
> 2. Use `SUM(CASE WHEN ... THEN 1 ELSE 0 END)` for the numerator
> 3. Wrap the denominator in `NULLIF(..., 0)` to prevent division by zero

---

### Break down the requirements

#### Step 1: Filter to qualifying rows

Apply the WHERE clause to isolate the correct subset before computing the ratio.

#### Step 2: Group by `platform`

`GROUP BY platform` produces one output row per distinct value.

#### Step 3: Compute the ratio with CASE and NULLIF

The numerator uses `SUM(CASE WHEN opened THEN 1 ELSE 0 END)`. Wrapping the denominator in `NULLIF(COUNT(*), 0)` prevents division by zero.

#### Step 4: Round and order

Use `ROUND(..., 4)` for clean decimal output and sort by rate descending.

---

### The solution

**Case-branch for active user open rate**

```sql
SELECT
    platform,
    ROUND(
        1.0 * SUM(CASE WHEN opened = 1 THEN 1 ELSE 0 END)
        / NULLIF(COUNT(*), 0),
        4
    ) AS rate
FROM push_notifs a
JOIN users b ON a.user_id = b.user_id
WHERE status = 'active'
GROUP BY platform
ORDER BY rate DESC
```

> **Cost Analysis**
>
> The main table has 50M rows (5 GB). The GROUP BY reduces the row count early, keeping downstream operations cheap. The smaller dimension table keeps the join selective. An index on the filter or join column would improve performance at scale.

> **Interviewers Watch For**
>
> Strong candidates state the correct `GROUP BY` grain before writing any SQL, showing they think about the output shape first. Division-by-zero handling is a silent correctness bug; interviewers watch for `NULLIF` or equivalent protection.

> **Common Pitfall**
>
> Selecting a non-aggregated column without including it in `GROUP BY` is the most common error. Some engines reject it; others silently return arbitrary values.

---

## Common follow-up questions

- The `email` column in `users` has roughly 1% NULLs. How does your query handle those rows, and would the result change if NULLs were replaced with zeros? _(Tests whether the candidate understands how NULLs propagate through aggregation functions and whether their WHERE/JOIN conditions implicitly filter them out.)_
- Your CASE expression branches on `user_id`. What happens if a new category value appears that none of your WHEN clauses match? _(Tests whether the candidate uses a meaningful ELSE branch or lets unmatched rows silently become NULL.)_
- `notif_id` in `push_notifs` has ~50M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- If the business definition of `title` changed mid-quarter (e.g., a status value was renamed), how would you handle historical consistency? _(Tests awareness of slowly changing dimensions and backward-compatible query design.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/active_user_open_rate)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.