# Retargeting Campaign Impressions

> Retargeting impressions. All of them.

Canonical URL: <https://datadriven.io/problems/retargeting_campaign_impressions>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

There's a discrepancy in ad impression numbers. Pull all available fields for campaigns whose name contains 'retarget' so we can verify what's flowing into the retargeting segment.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply pattern matching with LIKE to the `ad_impressions` table, simulating a real campaign performance workflow. Pay attention to the `ad_campaign` column as they drive the aggregation and output.

---

### Break down the requirements

#### Step 1: Filter with pattern matching on `ad_campaign`

The `LIKE '%retarget%'` pattern narrows the result to matching rows. The `%` wildcard matches any sequence of characters.

#### Step 2: Select the target columns

The SELECT clause picks exactly the columns the prompt asks for. Returning extra columns or missing a required alias would fail the grading check.

---

### The solution

**Pattern-match for retargeting campaign impressions**

```sql
SELECT *
FROM ad_impressions
WHERE ad_campaign LIKE '%retarget%'
```

> **Cost Analysis**
>
> With ~200M rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Forgetting the `%` wildcard or placing it on the wrong side changes the match semantics entirely. `LIKE 'x%'` matches prefixes; `LIKE '%x'` matches suffixes.

---

## Common follow-up questions

- What would happen to your result if `ad_impressions.impression_time` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `impression_time` and uses DISTINCT or deduplication where needed.)_
- With 200,000,000 distinct values in `ad_impressions.impression_id`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `impression_id` affects grouping and sort operations.)_
- Your LIKE pattern may prevent index usage on `ad_impressions`. How would you restructure the filter to be index-friendly? _(Tests understanding of leading-wildcard LIKE and its impact on index scans.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/retargeting_campaign_impressions)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.