# First Time Learners Per Day

> Brand new users, day by day.

Canonical URL: <https://datadriven.io/problems/first_time_learners_per_day>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For every date in the data, count the number of users who started their very first session on that day.

## Worked solution and explanation

### Why this problem exists in real interviews

Counting distinct first events per day is a core growth metric: DAU new-user acquisition, cohort sizing, retention windows. Interviewers use it to test whether you reach for MIN() inside a CTE versus a self-join or window function, and whether you understand why a simple GROUP BY session_start gets it wrong.

> **Trick to Solving**
>
> Two-step: (1) CTE finds each user's first session date with MIN(DATE(session_start)). (2) Outer query counts users per first_session_date. The key insight is that you must group by user first to de-duplicate, then group by date to count.

---

### Break down the requirements

#### Step 1: Find each user's earliest session date

`MIN(DATE(session_start)) AS first_session_date` grouped by user_id. DATE() strips the time component so sessions on the same day collapse.

#### Step 2: Count first-time users per date

Outer `GROUP BY first_session_date` with `COUNT(*)`. Each user appears exactly once in the CTE, so COUNT(*) is the new-user count for that day.

#### Step 3: Order chronologically

`ORDER BY first_session_date` makes the output a readable growth timeline.

---

### The solution

**MIN() inside a CTE then aggregate**

```sql
WITH first_sessions AS (
  SELECT user_id, MIN(DATE(session_start)) AS first_session_date
  FROM user_sessions
  GROUP BY user_id
)
SELECT first_session_date, COUNT(*) AS new_user_count
FROM first_sessions
GROUP BY first_session_date
ORDER BY first_session_date
```

> **Cost Analysis**
>
> user_sessions has 60M rows. The CTE does one full pass grouped by user_id (cardinality 4M). The outer aggregate is cheap - only 4M rows in the CTE. An index on (user_id, session_start) makes the inner GROUP BY a sort-merge rather than a hash aggregate.

> **Interviewers Watch For**
>
> Whether you de-duplicate at the user level before aggregating by date. A direct GROUP BY DATE(session_start) counts sessions, not new users - a very common wrong answer.

> **Common Pitfall**
>
> `SELECT DATE(session_start), COUNT(DISTINCT user_id)` without the inner MIN step counts unique users active on that date, not users whose FIRST session was that date.

---

## Common follow-up questions

- How would you compute 7-day and 30-day rolling new-user counts? _(Tests window functions over the date-level result.)_
- What if a user can have sessions across multiple products - how do you find first-time learners per product per day? _(Adds a PARTITION BY product dimension to the MIN() step.)_
- Rewrite without a CTE using a window function. _(Tests ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY session_start) = 1 as an alternative.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/first_time_learners_per_day)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.