# The First Half

> New arrivals during one specific window.

Canonical URL: <https://datadriven.io/problems/the_first_half>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The growth team is measuring first-half cohort size for 2026. How many users signed up between January 1 and July 31 inclusive?

## Worked solution and explanation

### Why this problem exists in real interviews

Everyone reaches for `BETWEEN '2024-01-01' AND '2024-07-31'` without checking whether `signup_date` is DATE or TIMESTAMP. The half-open form in the expected query survives a schema change from DATE to TIMESTAMP without quietly dropping a day of signups.

---

### Break down the requirements

#### Step 1: Cohort definition

First half here means January through July inclusive, a seven-month window. Confirm that wording before writing; people read past it.

#### Step 2: Pick a boundary style

Use `>= Jan 1` and `< Aug 1`. Exclusive on the right so the predicate behaves identically for DATE and TIMESTAMP columns.

#### Step 3: Count, do not distinct

One row per user already, so `COUNT(*)` is fine and faster than `COUNT(DISTINCT user_id)`. Alias the result `signup_count`.

---

### The solution

**FIRST HALF COHORT COUNT**

```sql
SELECT COUNT(*) AS signup_count
FROM users
WHERE signup_date >= '2024-01-01'
  AND signup_date <  '2024-08-01'
```

> **Cost Analysis**
>
> Twenty million rows. A btree on `signup_date` turns this into a range scan plus index-only count. The half-open predicate is sargable; wrapping `signup_date` in `YEAR()` or `MONTH()` would defeat the index.

> **Interviewers Watch For**
>
> Do you treat July 31 as part of the window, and does your predicate still capture every signup on July 31 if the column becomes a timestamp. The half-open form answers both at once.

> **Common Pitfall**
>
> `BETWEEN '2024-01-01' AND '2024-07-31'` is correct for a DATE column. The moment someone migrates `signup_date` to TIMESTAMP, every signup after midnight on July 31 disappears from the cohort and nobody notices for a quarter.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- Break the count out by month so we can see the curve across the first half. _(Tests `DATE_TRUNC` plus `GROUP BY` while preserving the sargable filter on `signup_date`.)_
- Restrict to users whose `account_status` is still active. _(Probes adding the filter without breaking the index range scan.)_
- Compare this year's first-half count to last year's, side by side. _(Checks comfort with conditional aggregation or a year-keyed `GROUP BY`.)_

> **Half-Open Habit**
>
> Default to `>= start AND < next_period_start` for every date filter. Works for DATE, TIMESTAMP, and TIMESTAMPTZ unchanged, and composes cleanly when you bucket by week or month later.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_first_half)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.