# Clicked Holiday Impressions

> Holiday ads. Who actually clicked?

Canonical URL: <https://datadriven.io/problems/clicked_holiday_impressions>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The marketing team wants to measure cross-channel effectiveness. How many clicked ad impressions came from users who also received a push notification in a campaign containing 'holiday'?

## Worked solution and explanation

### Why this problem exists in real interviews

This tests a semi-join (EXISTS or IN) with string matching. The question combines cross-table filtering with a substring condition, probing whether you can express "users who appear in another table matching a pattern."

---

### Break down the requirements

#### Step 1: Identify holiday push notification users

Find distinct `user_id` values from `push_notifs` where `campaign LIKE '%holiday%'`.

#### Step 2: Count clicked impressions for those users

From `ad_impressions`, count rows where `clicked = 1` and `user_id` is in the holiday notification set.

---

### The solution

**Semi-join with substring filter**

```sql
SELECT COUNT(*) AS clicked_holiday_impressions
FROM ad_impressions ai
WHERE ai.clicked = 1
  AND EXISTS (
      SELECT 1 FROM push_notifs pn
      WHERE pn.user_id = ai.user_id
        AND pn.campaign LIKE '%holiday%'
  )
```

> **Cost Analysis**
>
> The EXISTS subquery is correlated on `user_id`. With an index on `push_notifs(user_id, campaign)`, each probe is fast. The outer scan filters 350M rows to ~3.5M clicked ones (1% CTR), then each probes the subquery. Alternatively, a hash semi-join on the materialized holiday user set works.

> **Interviewers Watch For**
>
> EXISTS vs IN vs JOIN: EXISTS short-circuits per row and handles duplicates cleanly. A JOIN would overcount if a user received multiple holiday notifications. Strong candidates explain this trade-off.

> **Common Pitfall**
>
> Using `JOIN push_notifs` instead of EXISTS would multiply the count for users with multiple holiday notifications. EXISTS or `IN (SELECT DISTINCT ...)` avoids this.

---

## Common follow-up questions

- What is the performance difference between EXISTS, IN, and JOIN here? _(Tests query plan knowledge: EXISTS short-circuits, IN materializes a list, JOIN may fan out.)_
- How would you also break down the count by campaign? _(Requires JOIN and GROUP BY, with awareness of the fan-out risk.)_
- What if 'holiday' could appear in different casings? _(Tests ILIKE or LOWER() for case-insensitive matching.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/clicked_holiday_impressions)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.