# Campaign Bookend Engagement

> First impression versus last. The gap.

Canonical URL: <https://datadriven.io/problems/campaign_bookend_engagement>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

For each ad campaign, what percentage of its total impressions happened on the campaign's very first day versus its very last day?

## Worked solution and explanation

### Why this problem exists in real interviews

Ad analytics teams probe whether you can compute multiple group-level fractions in a single scan instead of correlating subqueries against a 500M row fact. The bigger signal: do you flag single-day campaigns where first_day equals last_day and both percentages return 100? That edge case ships to production if you don't name it.

---

### Break down the requirements

#### Step 1: Per-campaign date bounds

First CTE `campaign_dates`: `MIN(DATE(impression_time))` and `MAX(DATE(impression_time))` grouped by `ad_campaign`. These are the bookends every row gets compared against.

#### Step 2: Per-campaign denominators

Second CTE `campaign_counts`: `COUNT(*)` grouped by `ad_campaign`. Compute it once. Repeating the count inline per branch doubles the scan.

#### Step 3: Conditional aggregation

In the outer query, `SUM(CASE WHEN DATE(impression_time) = cd.first_day THEN 1 ELSE 0 END)` collapses the first-day count into the same pass as the last-day count.

#### Step 4: Cast before dividing

Wrap the SUM in `CAST(... AS REAL)` and multiply by `100.0`. Integer division silently truncates to 0 for any campaign whose bookend share is under 100 percent.

---

### The solution

**BOOKEND PERCENTAGES IN ONE PASS**

```sql
WITH campaign_dates AS (
  SELECT ad_campaign,
         MIN(DATE(impression_time)) AS first_day,
         MAX(DATE(impression_time)) AS last_day
  FROM ad_impressions
  GROUP BY ad_campaign
),
campaign_counts AS (
  SELECT ad_campaign, COUNT(*) AS total_impressions
  FROM ad_impressions
  GROUP BY ad_campaign
)
SELECT a.ad_campaign,
       CAST(SUM(CASE WHEN DATE(a.impression_time) = cd.first_day THEN 1 ELSE 0 END) AS REAL)
         * 100.0 / cc.total_impressions AS first_day_pct,
       CAST(SUM(CASE WHEN DATE(a.impression_time) = cd.last_day THEN 1 ELSE 0 END) AS REAL)
         * 100.0 / cc.total_impressions AS last_day_pct
FROM ad_impressions a
JOIN campaign_dates cd ON a.ad_campaign = cd.ad_campaign
JOIN campaign_counts cc ON a.ad_campaign = cc.ad_campaign
GROUP BY a.ad_campaign
ORDER BY a.ad_campaign
```

> **Cost Analysis**
>
> 500M rows, partitioned by `impression_time`. Both CTEs are full scans plus a hash agg on low-cardinality `ad_campaign`, so optimizers reuse one scan. The outer join to two tiny aggregate sides is a broadcast hash join. If the engine has stats, expect one scan, not three.

> **Interviewers Watch For**
>
> Ask out loud: do single-day campaigns count? When `first_day = last_day`, both percentages return 100 and sum to 200. Confirm whether the interviewer wants those filtered, deduped to one column, or reported as is. Pick a behavior and call it out.

> **Common Pitfall**
>
> Writing `SUM(CASE ...) / cc.total_impressions` without the REAL cast. In SQLite and Postgres integer math, every bookend share that isn't exactly 100 percent returns 0. The `* 100.0` alone is not enough if SUM stays integer; cast the numerator.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you also return the click-through rate on the first and last day? _(Tests whether you can add `SUM(CASE WHEN clicked = 1 AND DATE(impression_time) = cd.first_day ...)` without a second scan.)_
- Rewrite this with window functions instead of CTEs. _(Probes `MIN() OVER (PARTITION BY ad_campaign)` plus `COUNT(*) OVER (PARTITION BY ad_campaign)` to avoid the joins entirely.)_
- Only include campaigns that ran at least seven distinct days. _(Forces a `HAVING COUNT(DISTINCT DATE(impression_time)) >= 7` filter on `campaign_dates`, which also kills the single-day edge case.)_
- What changes if `impression_time` is stored as UTC but campaigns are reported in local time zones? _(Tests date-boundary thinking and whether you reach for `AT TIME ZONE` before grouping.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/campaign_bookend_engagement)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.