# Same-Day Signup Rate

> Percentage of transactions on the signup date.

Canonical URL: <https://datadriven.io/problems/same_day_signup_rate>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

What percentage of all transactions were completed on the same day the customer first registered? Return a single number rounded to 2 decimal places.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply self-join to the `users` and `transactions` tables, simulating a real analytics workflow. Pay attention to columns like `user_id`, `signup_date`, and `transaction_date` as they drive the aggregation and output.

> **Trick to Solving**
>
> When the prompt asks for multiple metrics split by a condition (e.g., resolved vs. unresolved), conditional aggregation avoids multiple passes.
> 
> 1. Spot the split: two or more categories in one output row
> 2. Use `SUM(CASE WHEN condition THEN 1 ELSE 0 END)` for each bucket
> 3. Group by the common dimension

---

### Break down the requirements

#### Step 1: Join `transactions` to `users`

The join connects the two tables on their shared key. This brings the columns needed for filtering and aggregation into a single row set.

#### Step 2: Use conditional aggregation with CASE

A `CASE` expression inside the aggregate function splits rows into buckets without multiple passes over the data. Each condition maps to one output column.

---

### The solution

**Join `transactions` to `users` to find same-day signup rate**

```sql
SELECT ROUND(100.0 * SUM(CASE WHEN DATE(t.transaction_date) = DATE(u.signup_date) THEN 1 ELSE 0 END) / COUNT(*), 2) AS same_day_pct
FROM transactions t
JOIN users u ON t.user_id = u.user_id
```

> **Cost Analysis**
>
> With ~65M rows, the join cost depends on the smaller table's cardinality. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you can pivot data with conditional aggregation in a single pass instead of multiple queries.

> **Common Pitfall**
>
> Placing the CASE expression outside the aggregate (e.g., `CASE WHEN ... THEN SUM(x)`) changes the semantics entirely. The CASE must go inside the aggregate.

---

## Common follow-up questions

- What would happen to your result if `transactions.transaction_date` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `transaction_date` and uses DISTINCT or deduplication where needed.)_
- If `transactions` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `user_id`? _(Tests ability to identify performance hotspots related to `transactions.user_id` at scale.)_
- Your conditional CASE logic assumes the categories are exhaustive. What happens if a row in `users` falls into none of the branches? _(Tests awareness of the implicit ELSE NULL in CASE expressions.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/same_day_signup_rate)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.