# New User Purchases

> What's this year's signup cohort worth so far?

Canonical URL: <https://datadriven.io/problems/new_user_purchases>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Growth wants to quantify how much revenue the 2026 signup cohort has generated. Sum all transaction spending for users who signed up during calendar year 2026 and return a single total.

## Worked solution and explanation

### Why this problem exists in real interviews

Connecting users to transactions requires getting the join cardinality right before applying date extraction. Interviewers use this in mid-level screens to test whether you verify row counts after each join step.

---

### Break down the requirements

#### Step 1: Define new users

Identify users whose signup or first-activity date falls within the target window.

#### Step 2: Join to purchase data

Join the new-user set to the transactions table on `user_id` and apply any additional time constraints.

#### Step 3: Aggregate purchase metrics

Compute the requested output (count, sum, or individual records) for the qualifying users.

---

### The solution

**Cohort join with time filter**

```sql
SELECT t.*
FROM transactions t
JOIN users u ON t.user_id = u.user_id
WHERE u.signup_date >= DATE('now', '-30 days')
```

> **Cost Analysis**
>
> At `users` (15,000,000 rows), `transactions` (100,000,000 rows), a full table scan is expensive. Partition pruning (if the table is partitioned on the filter column) is the first optimization. A covering index on the `GROUP BY` + filter columns eliminates random I/O. Consider a materialized view for repeated dashboard queries.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If users.user_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in user_id.)_
- If the join between users and transactions produces a fan-out, how does that affect your aggregate? _(Tests awareness of join cardinality and its impact on SUM, COUNT, and AVG results.)_
- With millions of distinct values in users.user_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like user_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/new_user_purchases)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.