# Missing Email for Non-Active Users

> No email on file. No recent activity. Something smells off.

Canonical URL: <https://datadriven.io/problems/missing_email_for_non_active_users>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

A downstream notification pipeline is failing for a subset of users. You traced the issue to users from 2026 who have no email on file but whose account is not active. Find those user IDs.

## Worked solution and explanation

### Why this problem exists in real interviews

Against the users table, date extraction on username values is the key operation. Interviewers favor this as a fundamentals check because it exposes whether candidates handle ties, NULLs, and ordering edge cases correctly.

---

### Break down the requirements

#### Step 1: Read from `users`

The query targets `users` with 6 columns. Identify which columns are needed for the output.

#### Step 2: Filter to the target rows

Filter for NULL or non-NULL values in the `WHERE` clause. This must happen before aggregation to avoid corrupted results.

#### Step 3: Return the result set

Select the required columns with any necessary aliasing or formatting.

---

### The solution

**Compound NULL and status filter**

```sql
SELECT user_id
FROM users
WHERE (email IS NULL OR email = '')
    AND account_status <> 'active'
    AND STRFTIME('%Y', signup_date) = '2026'
```

> **Cost Analysis**
>
> The query scans 18M rows from `users`. The aggregation reduces the row count before any downstream processing, which is the key performance lever.

> **Interviewers Watch For**
>
> Explicitly mentioning NULL handling before being asked signals production awareness. Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax.

> **Common Pitfall**
>
> NULL values are silently excluded from `COUNT(column)` but included in `COUNT(*)`. Mixing these up produces incorrect totals.

---

## Common follow-up questions

- What happens to your result if users.email contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on email.)_
- How would you verify that your aggregation on users.user_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in users.user_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like user_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/missing_email_for_non_active_users)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.