# Unique Visitors

> Which months actually had an audience?

Canonical URL: <https://datadriven.io/problems/unique_visitors>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The product team is building a monthly active users report for the board. For each calendar month in the session data, tally the number of individual users who visited at least once. Only include months that had more than two unique visitors. List the months from most recent to oldest.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests date truncation, `COUNT(DISTINCT ...)`, and `HAVING` in combination. Interviewers check whether you can derive calendar months from timestamps, aggregate unique visitors, and filter groups by count.

---

### Break down the requirements

#### Step 1: Extract calendar month

`STRFTIME('%Y-%m', session_start)` truncates each session timestamp to its year-month for grouping.

#### Step 2: Count unique visitors per month

`COUNT(DISTINCT user_id)` counts individual users within each month.

#### Step 3: Filter months with more than 2 unique visitors

`HAVING COUNT(DISTINCT user_id) > 2` excludes low-traffic months.

#### Step 4: Order most recent first

`ORDER BY month DESC` lists months from newest to oldest.

---

### The solution

**Monthly unique visitors with minimum threshold**

```sql
SELECT
    STRFTIME('%Y-%m', session_start) AS month,
    COUNT(DISTINCT user_id) AS unique_visitors
FROM user_sessions
GROUP BY STRFTIME('%Y-%m', session_start)
HAVING COUNT(DISTINCT user_id) > 2
ORDER BY month DESC
```

> **Cost Analysis**
>
> Full scan of 2B rows. The date function executes per row. With 10M distinct users, the DISTINCT hash set per month is large. Partitioning by month would make each group's scan independent.

> **Interviewers Watch For**
>
> Whether you repeat the STRFTIME expression in both SELECT and GROUP BY. Using an alias in GROUP BY is not standard in all SQL dialects.

> **Common Pitfall**
>
> Using `COUNT(user_id)` instead of `COUNT(DISTINCT user_id)`. Without DISTINCT, you count total sessions, not unique visitors.

---

## Common follow-up questions

- What if session_start included time zones? _(Tests whether to normalize to UTC before truncating to month.)_
- How would you show months with zero visitors (gap-filling)? _(Requires a calendar table or generate_series to produce all months and LEFT JOIN.)_
- At 2 billion rows, how would you optimize this query? _(Pre-aggregate daily unique visitors in a summary table, then roll up to monthly.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/unique_visitors)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.