# Opened Notifications in Jan-Feb

> Two months of push notifications. How many were actually read?

Canonical URL: <https://datadriven.io/problems/opened_notifications_in_jan_feb>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The marketing team is evaluating their Q1 push campaign and needs the total number of notifications opened during January and February combined.

## Worked solution and explanation

### Why this problem exists in real interviews

This focuses on date extraction and counting within push_notifs, specifically around the title column. Interviewers present it in mid-level screens because the edge cases around NULL values and boundary conditions reveal depth of understanding.

---

### Break down the requirements

#### Step 1: Filter to January and February

Extract the month from `sent_at` and filter: `WHERE CAST(strftime('%m', sent_at) AS INTEGER) IN (1, 2)`.

#### Step 2: Filter to opened notifications

`AND opened = 1` keeps only opened notifications.

#### Step 3: Count the total

`SELECT COUNT(*)` produces the combined count.

---

### The solution

**Month extraction with boolean filter**

```sql
SELECT COUNT(*) AS opened_count
FROM push_notifs
WHERE CAST(strftime('%m', sent_at) AS INTEGER) IN (1, 2)
  AND opened = 1
```

> **Cost Analysis**
>
> At `push_notifs` (100,000,000 rows), a full table scan is expensive. Partition pruning (if the table is partitioned on the filter column) is the first optimization. A covering index on the `GROUP BY` + filter columns eliminates random I/O. Consider a materialized view for repeated dashboard queries.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What happens to your result if push_notifs.campaign contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on campaign.)_
- How would you verify that your aggregation on push_notifs.notif_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in push_notifs.notif_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like notif_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/opened_notifications_in_jan_feb)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.