# Peak Retargeting Revenue Month

> Retargeting revenue. The peak month.

Canonical URL: <https://datadriven.io/problems/peak_retargeting_revenue_month>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For retargeting campaigns (names containing 'retarget') in 2025, which month had the highest total ad revenue? Show the month, total revenue, maximum single-impression revenue, and average revenue.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply date extraction for time bucketing to the `ad_impressions` table, simulating a real revenue analysis workflow. Pay attention to columns like `ad_campaign`, `impression_time`, and `revenue` as they drive the aggregation and output.

---

### Break down the requirements

#### Step 1: Filter to retargeting campaigns

`WHERE ad_campaign LIKE '%retarget%'` selects impressions from retargeting campaigns.

#### Step 2: Filter to the target year

Add a date filter on `impression_time` for the specified year.

#### Step 3: Group by month and compute metrics

`GROUP BY month` with `SUM(revenue)`, `MAX(revenue)`, and `AVG(revenue)`.

#### Step 4: Select the top month

`ORDER BY SUM(revenue) DESC LIMIT 1` returns the month with the highest total revenue.

---

### The solution

**Pattern-match for peak retargeting revenue month**

```sql
SELECT
    strftime('%m', impression_time) AS month,
    SUM(revenue) AS total_revenue,
    MAX(revenue) AS max_single_revenue,
    AVG(revenue) AS avg_revenue
FROM ad_impressions
WHERE ad_campaign LIKE '%retarget%'
  AND strftime('%Y', impression_time) = '2026'
GROUP BY month
ORDER BY total_revenue DESC
LIMIT 1
```

> **Cost Analysis**
>
> At `ad_impressions` (350,000,000 rows), a full table scan is expensive. Partition pruning (if the table is partitioned on the filter column) is the first optimization. A covering index on the `GROUP BY` + filter columns eliminates random I/O. Consider a materialized view for repeated dashboard queries.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What would happen to your result if `ad_impressions.ad_campaign` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `ad_campaign` and uses DISTINCT or deduplication where needed.)_
- `ad_impressions.impression_time` has roughly 31,536,000 distinct values. What index strategy would you use to avoid a full scan on `ad_impressions`? _(Tests indexing knowledge specific to the high-cardinality `impression_time` column in `ad_impressions`.)_
- Your LIKE pattern may prevent index usage on `ad_impressions`. How would you restructure the filter to be index-friendly? _(Tests understanding of leading-wildcard LIKE and its impact on index scans.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/peak_retargeting_revenue_month)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.