# Ad Clickers

> Who clicked? What did they spend?

Canonical URL: <https://datadriven.io/problems/ad_clickers>

Domain: SQL · Difficulty: easy · Seniority: L4

## Problem

The monetization team is reviewing ad performance for the quarter. They want every user who clicked on at least one ad, along with the total revenue those clicks generated. Show each user's username and total click revenue, listed from the highest revenue contributor down.

## Worked solution and explanation

### Why this problem exists in real interviews

Join a dimension to a filtered fact and sum a metric: this is the most common shape of revenue analytics in the wild. Total purchase value by customer, total click revenue by ad, total session minutes by user. Interviewers use this prompt to check whether you place the filter in WHERE (correct) or in HAVING (wrong scope), whether you join on the right key, and whether you sort by the aggregate with a deterministic tiebreak.

---

### Break down the requirements

#### Step 1: Join users to ad_impressions on user_id

The dimension (username) lives on users; the fact (revenue, clicked flag) lives on ad_impressions. INNER JOIN on user_id keeps only users who had at least one impression. Users with no impressions should not appear, so INNER is correct.

#### Step 2: Filter to clicked = 1 in WHERE

WHERE a.clicked = 1 drops unclicked impressions before grouping. This belongs in WHERE because it filters rows, not groups. A user with zero clicks is eliminated at the row level and therefore absent from the grouped output, satisfying 'at least one click.'

#### Step 3: Group by username, sum revenue

GROUP BY u.username produces one row per user. SUM(a.revenue) totals click revenue. If username is not unique (rare, but possible), GROUP BY user_id is safer; the spec asks for username in the output, so group by username if it is unique by contract.

#### Step 4: Sort by total_revenue DESC, username ASC

ORDER BY total_revenue DESC, username ASC. The tiebreak on username ASC makes output deterministic for users with identical totals (common when revenue is always the same small set of values).

---

### The solution

**Join, filter in WHERE, group, sort with tiebreak**

```sql
SELECT
    u.username,
    SUM(a.revenue) AS total_revenue
FROM users u
JOIN ad_impressions a ON u.user_id = a.user_id
WHERE a.clicked = 1
GROUP BY u.username
ORDER BY total_revenue DESC, u.username ASC
```

> **Cost Analysis**
>
> Cost is driven by ad_impressions (the fact). With an index on (user_id, clicked), the filter and join collapse into a single index scan, O(clicks). GROUP BY on a hash of username is O(users_with_clicks). The ORDER BY sort is O(g log g) where g is the grouped row count, typically much smaller than the raw fact table.

> **Interviewers Watch For**
>
> Whether you put 'clicked = 1' in WHERE (correct, pre-aggregation) or HAVING (wrong, HAVING is for post-aggregation filters), whether you use INNER JOIN and justify it ('at least one click' implies only users who clicked), and whether you include the tiebreak. Strong candidates note that clicked = 1 could also be clicked = true or clicked IS TRUE depending on column type, and ask about the type.

> **Common Pitfall**
>
> Putting 'clicked = 1' in HAVING. It would work on some engines (because HAVING can reference non-aggregate columns grouped on) but is semantically wrong: you want to drop unclicked rows before summing, not filter groups after. On the wrong engine it silently passes; on the right engine it throws. Another miss: LEFT JOIN instead of INNER, which drags in users with no impressions and gives them total_revenue = NULL.

---

## Common follow-up questions

- How would you also return the number of clicks alongside the revenue? _(add COUNT(*) AS click_count to the SELECT. Explain why COUNT(*) works after the WHERE filter.)_
- What changes if you need top 5 clickers only? _(add LIMIT 5 after the ORDER BY. Discuss how the ORDER BY tiebreak interacts with LIMIT (the tiebreak is why the output is deterministic across runs).)_
- How would you handle a user who has clicks but every click has revenue = NULL? _(SUM returns NULL for all-NULL groups. The user appears with total_revenue NULL, which may violate 'at least one click' in spirit; discuss COALESCE(SUM(revenue), 0) or HAVING SUM(revenue) IS NOT NULL.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/ad_clickers)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.