# Monthly Unique Users per Campaign

> Monthly reach, campaign by campaign.

Canonical URL: <https://datadriven.io/problems/monthly_unique_users_per_campaign>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

For each ad campaign and month, count the unique users who received an impression. Assume all data falls within the same year. Return the campaign, month, and unique user count.

## Worked solution and explanation

### Why this problem exists in real interviews

Extracting insights from ad_impressions.ad_campaign grouped by impression_time via grouping and date extraction is the central task. It is used as a fundamentals check to test whether you pick the right aggregation function and partition boundary on the first attempt.

---

### Break down the requirements

#### Step 1: Aggregate with COUNT DISTINCT

Group by the output grain and apply `COUNT DISTINCT()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 2: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**Campaign-month COUNT DISTINCT**

```sql
SELECT ad_campaign,
    CAST(STRFTIME('%m', impression_time) AS INTEGER) AS month,
    COUNT(DISTINCT user_id) AS unique_users
FROM ad_impressions
GROUP BY ad_campaign, CAST(STRFTIME('%m', impression_time) AS INTEGER)
ORDER BY ad_campaign, month
```

> **Cost Analysis**
>
> The query scans 300M rows from `ad_impressions`. The aggregation reduces the row count before any downstream processing, which is the key performance lever. CTEs in most engines are optimization fences. For production workloads, consider inlining or materializing the intermediate results.

> **Interviewers Watch For**
>
> Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax. Explaining why `ROW_NUMBER` is preferred over `DISTINCT` for deduplication shows you understand the difference between collapsing and selecting.

> **Common Pitfall**
>
> Comparing dates stored as TEXT without casting can produce lexicographic instead of chronological ordering. Always confirm the column type.

---

## Common follow-up questions

- If ad_impressions.impression_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in impression_id.)_
- How would you verify that your aggregation on ad_impressions.impression_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in ad_impressions.impression_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like impression_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/monthly_unique_users_per_campaign)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.