# Top Flagged Campaign Resolutions

> Flagged the most. Resolved how?

Canonical URL: <https://datadriven.io/problems/top_flagged_campaign_resolutions>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

Find the ad campaign(s) that received the most user-reported alerts. Among those top campaigns, how many alerts were resolved? Match alert_events to ad_impressions where alerts fired in the same hour as an impression. Ignore alerts without a valid alert_id. Return the campaign and resolved count.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests whether you spot a fake join key and still ship a correct answer. `alert_events` and `ad_impressions` share no FK, so the prompt forces you to bucket by hour with `SUBSTR(fired_at, 1, 13)`. The interviewer wants to hear you call out the smell, then handle the resulting many-to-many fan-out without inflating counts.

---

### Break down the requirements

#### Step 1: Define the join key out loud

There is no real link between alerts and campaigns. You are inferring it from same-hour co-occurrence via `SUBSTR(ae.fired_at, 1, 13) = SUBSTR(ai.impression_time, 1, 13)`. Say that before you write a line.

#### Step 2: Filter and dedupe alerts

Drop rows where `ae.alert_id IS NULL` in the WHERE. Use `COUNT(DISTINCT ae.alert_id)` because each alert hour matches every impression in that hour, so the join multiplies.

#### Step 3: Rank campaigns in a CTE

First CTE groups by `ai.ad_campaign`. Second CTE picks campaigns where `alert_count = MAX(alert_count)` so ties survive. ROW_NUMBER would silently drop a tied winner.

#### Step 4: Count resolved in the outer query

Re-join the top campaigns back to impressions and alerts, then `COUNT(CASE WHEN ae.status = 'resolved' THEN 1 END)` per campaign. LEFT JOIN so a top campaign with zero resolutions still returns 0.

---

### The solution

**TOP FLAGGED CAMPAIGN RESOLUTIONS**

```sql
WITH campaign_alerts AS (
  SELECT ai.ad_campaign,
         COUNT(DISTINCT ae.alert_id) AS alert_count
  FROM alert_events ae
  INNER JOIN ad_impressions ai
    ON SUBSTR(ae.fired_at, 1, 13) = SUBSTR(ai.impression_time, 1, 13)
  WHERE ae.alert_id IS NOT NULL
  GROUP BY ai.ad_campaign
),
top_campaigns AS (
  SELECT ad_campaign
  FROM campaign_alerts
  WHERE alert_count = (SELECT MAX(alert_count) FROM campaign_alerts)
)
SELECT tc.ad_campaign,
       COUNT(CASE WHEN ae.status = 'resolved' THEN 1 END) AS resolved_count
FROM top_campaigns tc
INNER JOIN ad_impressions ai
  ON tc.ad_campaign = ai.ad_campaign
LEFT JOIN alert_events ae
  ON SUBSTR(ae.fired_at, 1, 13) = SUBSTR(ai.impression_time, 1, 13)
 AND ae.status = 'resolved'
GROUP BY tc.ad_campaign
```

> **Cost Analysis**
>
> 300M impressions joined to 20M alerts on `SUBSTR(..., 1, 13)` kills both partition pruning and any index. Optimizer falls back to hash join on a synthetic string. In production, materialize a `hour_bucket` column with `DATE_TRUNC('hour', ...)`, partition by it, and the join drops from a full scan pair to a co-partitioned merge.

> **Interviewers Watch For**
>
> Before writing, ask: how do alerts relate to campaigns? There is no FK, you are bucketing by hour. Then ask: do ties count? That decides DENSE_RANK or `= MAX(...)` vs `ROW_NUMBER`. Skip both questions and you've quietly treated an ambiguous schema as a clean one.

> **Common Pitfall**
>
> Writing `COUNT(ae.alert_id)` instead of `COUNT(DISTINCT ae.alert_id)`. The same-hour join is many-to-many: one alert hour matches every impression in that hour, so each alert gets counted once per impression. Your top campaign is then whichever one had the most impressions, not the most alerts.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- What if two campaigns tie for the most alerts? _(Probes whether you reach for `= MAX(...)` or DENSE_RANK instead of LIMIT 1 or ROW_NUMBER.)_
- How would you redesign the schema so this query is not needed? _(Tests whether you can propose adding `ad_campaign_id` to `alert_events` at ingest, killing the hour-bucket heuristic.)_
- Change the window from same hour to within 15 minutes. _(Forces an inequality range join on timestamps and a discussion of whether to bucket finer or use `BETWEEN`.)_
- How do you handle alerts that fire at 11:59 and impressions at 12:00? _(Surfaces that `SUBSTR(..., 1, 13)` truncates and misses cross-boundary co-occurrence.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_flagged_campaign_resolutions)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.