# No Gaps

> Zero blanks. A clean contact list.

Canonical URL: <https://datadriven.io/problems/no_gaps>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The email marketing team is preparing a campaign blast and needs a clean contact list with no blank fields. For each user, show their username, email address (substituting 'unknown' if the email is missing), and age bucket (substituting 'unspecified' if the age bucket is missing). Only include users whose account is currently active.

## Worked solution and explanation

### Why this problem exists in real interviews

The interviewer wants to see you apply gap-and-island and null handling to users.username while accounting for the distribution of email. This surfaces as a fundamentals check because small logic errors produce results that look correct at a glance.

---

### Break down the requirements

#### Step 1: Filter to active accounts

`WHERE account_status = 'active'` restricts the result to current users.

#### Step 2: Substitute defaults for NULLs

`COALESCE(email, 'unknown')` and `COALESCE(age_bucket, 'unspecified')` replace missing values with the specified defaults.

#### Step 3: Select the three output columns

Return `username`, the coalesced email, and the coalesced age bucket.

---

### The solution

**COALESCE for NULL substitution with status filter**

```sql
SELECT
    username,
    COALESCE(email, 'unknown') AS email,
    COALESCE(age_bucket, 'unspecified') AS age_bucket
FROM users
WHERE account_status = 'active'
```

> **Cost Analysis**
>
> The query scans `users` (1,000,000 rows). A covering index on the filter and group columns would reduce I/O. At this scale, the full scan is acceptable but becomes costly if the table grows 10x.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What happens to your result if users.email contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on email.)_
- If users has duplicate timestamps for the same entity, how does the gap detection break? _(Tests awareness of the row-number gap technique's sensitivity to duplicates.)_
- With millions of distinct values in users.user_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like user_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/no_gaps)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.