# Most Popular Signup Day

> One day of the week wins on signups.

Canonical URL: <https://datadriven.io/problems/most_popular_signup_day>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The growth team is scheduling promotional pushes and wants to amplify the days that already get the most organic signups. Show each day of the week alongside its signup count, sorted from most to fewest.

## Worked solution and explanation

### Why this problem exists in real interviews

Working against users, this problem tests grouping and top-N selection on the username and email columns. Interviewers use it in mid-level screens because a subtle mis-grouping or filter placement changes the output without raising an error.

> **Trick to Solving**
>
> Read the prompt carefully for implicit constraints. The phrase structure hints at the grain of the output: what each row represents.
> 
> 1. Identify the output grain from the prompt (one row per what?)
> 2. Work backward from the desired output columns
> 3. Build the query inside-out: innermost subquery first, then layer on filters and aggregates

---

### Break down the requirements

#### Step 1: Aggregate with COUNT

Group by the output grain and apply `COUNT()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 2: Order and limit the output

Sort by the target metric and apply `LIMIT` to return the requested number of rows. Ensure the sort is deterministic to produce reproducible results.

---

### The solution

**Day-of-week extraction with mode detection**

```sql
SELECT CASE CAST(STRFTIME('%w', signup_date) AS INTEGER)
        WHEN 0 THEN 'Sunday'
        WHEN 1 THEN 'Monday'
        WHEN 2 THEN 'Tuesday'
        WHEN 3 THEN 'Wednesday'
        WHEN 4 THEN 'Thursday'
        WHEN 5 THEN 'Friday'
        WHEN 6 THEN 'Saturday'
    END AS signup_day,
    COUNT(*) AS signup_count
FROM users
GROUP BY STRFTIME('%w', signup_date)
ORDER BY signup_count DESC
LIMIT 1
```

> **Cost Analysis**
>
> The query scans 18M rows from `users`. The aggregation reduces the row count before any downstream processing, which is the key performance lever.

> **Interviewers Watch For**
>
> Interviewers expect you to articulate why you chose a specific join type and what happens to unmatched rows. Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax.

> **Common Pitfall**
>
> Forgetting that a JOIN can multiply rows when the relationship is one-to-many. Always check whether the join key is unique on at least one side.

---

## Common follow-up questions

- If users.user_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in user_id.)_
- How would you verify that your aggregation on users.user_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in users.user_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like user_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/most_popular_signup_day)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.