# Ad Revenue by Age Bucket

> Ad dollars, sliced by country.

Canonical URL: <https://datadriven.io/problems/ad_revenue_by_age_bucket>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The ad monetization team is evaluating which user demographics drive the most revenue. Show the total ad revenue for each age bucket, with the highest-earning buckets first. Exclude users who have no age bucket on file.

## Worked solution and explanation

### What this is really asking

`users.age_bucket` is nullable on 15M rows, and the prompt says exclude users with no bucket on file. That one filter, plus a 400M-row sum, is the whole job.

---

### Break down the requirements

#### Step 1: Join impressions to users

INNER JOIN on user_id. An anti-orphan join also drops impressions whose user row was hard-deleted, which matches the demographic intent.

#### Step 2: Filter null buckets, then group

WHERE u.age_bucket IS NOT NULL before GROUP BY. Putting it in HAVING works but forces the engine to aggregate a NULL group it will throw away.

#### Step 3: Sort highest revenue first

ORDER BY SUM(ai.revenue) DESC, referenced as the alias total_revenue. The prompt explicitly asks for highest-earning buckets first.

---

### The solution

**AD REVENUE BY AGE BUCKET**

```sql
SELECT u.age_bucket,
       SUM(ai.revenue) AS total_revenue
FROM ad_impressions ai
INNER JOIN users u
  ON ai.user_id = u.user_id
WHERE u.age_bucket IS NOT NULL
GROUP BY u.age_bucket
ORDER BY total_revenue DESC
```

> **Cost Analysis**
>
> 400M impressions hash-joined against 15M users dominates. Push the age_bucket IS NOT NULL filter down so the build side shrinks before the probe. Partition pruning on impression_time does nothing here because there is no time predicate.

> **Interviewers Watch For**
>
> Whether you use INNER vs LEFT JOIN, where the NULL filter lives, and whether you ORDER BY the alias or repeat SUM(revenue). All three are correct, but the choices show you thought about each one.

> **Common Pitfall**
>
> LEFT JOIN plus WHERE age_bucket IS NOT NULL silently degrades into an INNER JOIN and wastes the outer scan. If you want LEFT JOIN semantics, the NULL check belongs in the ON clause.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you split this into clicked vs unclicked revenue per bucket? _(Add SUM(CASE WHEN clicked THEN revenue END) alongside the total to expose conversion lift per cohort.)_
- What if age_bucket lived on a slowly changing dimension instead of users? _(You would join on user_id plus a valid-from range against impression_time, which turns this into an inequality join.)_
- How would you cap the result to the top three buckets only? _(Wrap the query and apply LIMIT 3, or use a RANK window if ties on revenue should all be returned.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/ad_revenue_by_age_bucket)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.