# Email Census

> The reachability split.

Canonical URL: <https://datadriven.io/problems/email_census>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The CRM team is sizing their outreach lists before the next campaign. Segment every user account into one of two groups: those with an email address on file and those without. For each segment, show how many users fall into each bucket and what share of all users that segment represents, expressed as a percentage rounded to one decimal place. List the larger segment first.

## Worked solution and explanation

### Why this problem exists in real interviews

Segmenting users by email domain is a common data engineering task. This tests string manipulation with `SUBSTR` and `INSTR`, combined with conditional logic to classify users into groups.

---

### Break down the requirements

#### Step 1: Extract the domain from email

Use `SUBSTR(email, INSTR(email, '@') + 1)` to pull everything after the `@` sign.

#### Step 2: Classify into two groups

The prompt asks for two groups. Use `CASE WHEN` on the extracted domain to segment users.

#### Step 3: Count per group

`GROUP BY` the segment with `COUNT(*)`.

---

### The solution

**Domain extraction with CASE segmentation**

```sql
SELECT
    CASE
        WHEN SUBSTR(email, INSTR(email, '@') + 1) LIKE '%company.com' THEN 'internal'
        ELSE 'external'
    END AS segment,
    COUNT(*) AS user_count
FROM users
GROUP BY segment
ORDER BY user_count DESC
```

> **Cost Analysis**
>
> Full table scan with per-row string computation. For repeated queries, a generated column storing the domain avoids runtime cost.

> **Interviewers Watch For**
>
> The interviewer checks whether you know the right string functions for the dialect: `SUBSTR`/`INSTR` for SQLite, `SPLIT_PART` for Postgres.

> **Common Pitfall**
>
> Off-by-one errors in `SUBSTR`: `INSTR(email, '@')` returns the `@` position, so you need `+ 1` to skip past it.

---

## Common follow-up questions

- How would you handle emails with no @ sign? _(Tests defensive coding with CASE WHEN or WHERE filters.)_
- What if you needed the top-level domain (.com, .org)? _(Tests nested string extraction after the last dot.)_
- How would you normalize domains to lowercase? _(Tests LOWER() before grouping.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/email_census)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.