# Campaign Cost Effectiveness

> Money in, conversions out. What is the ratio?

Canonical URL: <https://datadriven.io/problems/campaign_cost_effectiveness>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The marketing team is evaluating return on ad spend and needs a cost-effectiveness metric for each campaign between 2025 and 2026 inclusive. Show each campaign alongside the ratio of total revenue to total clicks.

## Worked solution and explanation

### Why this problem exists in real interviews

By forcing conditional branching with CASE combined with division-safe NULLIF guards on `ad_impressions`, this question separates candidates who understand how `ad_campaign`, `impression_time`, `clicked` behave under aggregation from those who guess at the GROUP BY clause, a pattern seen at the medium level (variant 1).

> **Trick to Solving**
>
> Any rate or ratio problem requires **null-safe division**. If the denominator can be zero, the query crashes or returns NULL silently.
> 
> 1. Identify the numerator and denominator conditions
> 2. Use `SUM(CASE WHEN ... THEN 1 ELSE 0 END)` for the numerator
> 3. Wrap the denominator in `NULLIF(..., 0)` to prevent division by zero

---

### Break down the requirements

#### Step 1: Group by `ad_campaign`

`GROUP BY ad_campaign` produces one output row per distinct value of `ad_campaign`.

#### Step 2: Compute the ratio with CASE and NULLIF

The numerator uses `SUM(CASE WHEN clicked THEN 1 ELSE 0 END)`. Wrapping the denominator in `NULLIF(COUNT(*), 0)` prevents division by zero.

#### Step 3: Round and order

Use `ROUND(..., 4)` for clean decimal output and sort by rate descending.

---

### The solution

**Case-branch for campaign cost effectiveness**

```sql
SELECT
    ad_campaign,
    ROUND(
        1.0 * SUM(CASE WHEN clicked = 1 THEN 1 ELSE 0 END)
        / NULLIF(COUNT(*), 0),
        4
    ) AS rate
FROM ad_impressions
GROUP BY ad_campaign
ORDER BY rate DESC
```

> **Cost Analysis**
>
> The main table has 400M rows (77 GB). Partitioned on `impression_time`, so queries filtering on that column skip most partitions. The GROUP BY reduces the row count early, keeping downstream operations cheap.

> **Interviewers Watch For**
>
> Strong candidates state the correct `GROUP BY` grain before writing any SQL, showing they think about the output shape first. Division-by-zero handling is a silent correctness bug; interviewers watch for `NULLIF` or equivalent protection.

> **Common Pitfall**
>
> Selecting a non-aggregated column without including it in `GROUP BY` is the most common error. Some engines reject it; others silently return arbitrary values.

---

## Common follow-up questions

- What happens to your results if `ad_campaign` in `ad_impressions` contains trailing whitespace or mixed casing? _(Tests awareness of text normalization issues that silently fragment GROUP BY results.)_
- Your CASE expression branches on `user_id`. What happens if a new category value appears that none of your WHEN clauses match? _(Tests whether the candidate uses a meaningful ELSE branch or lets unmatched rows silently become NULL.)_
- `impression_id` in `ad_impressions` has ~400M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- Could you express this same logic as a single query without CTEs or subqueries? What readability trade-off does that introduce? _(Tests whether the candidate can flatten nested logic and understands when decomposition aids maintainability.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/campaign_cost_effectiveness)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.