# Promo Campaign Cost per Acquisition

> The campaign ran. What did each customer cost?

Canonical URL: <https://datadriven.io/problems/promo_campaign_cost_per_acquisition>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

For promo campaigns (campaign names containing 'promo'), compute the cost-per-acquisition by year: total notifications divided by total opens. Show the campaign, year, and the resulting ratio.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this campaign performance scenario to test HAVING for post-aggregation filtering against the `push_notifs` table. The focus is on how you handle columns like `campaign`, `sent_at`, and `opened` when building the result. It also requires date extraction for time bucketing, pattern matching with LIKE.

---

### Break down the requirements

#### Step 1: Filter to promo campaigns

`WHERE campaign LIKE '%promo%'` selects promo campaigns.

#### Step 2: Extract year

`strftime('%Y', sent_at)` groups data by year.

#### Step 3: Group and compute ratio

`GROUP BY campaign, year` with `COUNT(*) * 1.0 / SUM(opened)` gives the cost-per-acquisition (notifications per open).

---

### The solution

**Having filter for promo campaign cost per acquisition**

```sql
SELECT
    campaign,
    strftime('%Y', sent_at) AS year,
    COUNT(*) * 1.0 / SUM(opened) AS cost_per_acquisition
FROM push_notifs
WHERE campaign LIKE '%promo%'
  AND opened IS NOT NULL
GROUP BY campaign, strftime('%Y', sent_at)
HAVING SUM(opened) > 0
```

> **Cost Analysis**
>
> With `push_notifs` (80,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What result would you get if every value in `push_notifs.campaign` were NULL? Would your query return an empty set or something unexpected? _(Tests extreme NULL scenarios and whether the candidate guards against edge cases in `campaign`.)_
- `push_notifs.sent_at` has roughly 31,536,000 distinct values. What index strategy would you use to avoid a full scan on `push_notifs`? _(Tests indexing knowledge specific to the high-cardinality `sent_at` column in `push_notifs`.)_
- If `push_notifs` contained late-arriving rows that were inserted after your query ran, how would you design an incremental update instead of re-aggregating? _(Tests understanding of incremental aggregation patterns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/promo_campaign_cost_per_acquisition)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.