# Features With Missing Values

> Missing data in the features.

Canonical URL: <https://datadriven.io/problems/features_with_missing_values>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The ML team is debugging a model training failure caused by null feature values. Surface every record in the feature store where the average value is missing.

## Worked solution and explanation

### Why this problem exists in real interviews

Filtering for NULL values is one of the most basic SQL operations, yet it trips up many candidates. This tests whether you use `IS NULL` rather than `= NULL`, a fundamental distinction in SQL semantics.

---

### Break down the requirements

#### Step 1: Filter for NULL avg_val

`WHERE avg_val IS NULL` isolates records with missing average values. `NULL = NULL` evaluates to NULL (falsy), not TRUE.

#### Step 2: Return all columns

Select all columns so the ML team can inspect the full record for debugging.

---

### The solution

**IS NULL filter for missing values**

```sql
SELECT *
FROM ml_features
WHERE avg_val IS NULL
ORDER BY feat_name
```

> **Cost Analysis**
>
> A partial index for NULLs (`CREATE INDEX ... WHERE avg_val IS NULL`) makes this extremely fast. Otherwise a full scan with a simple predicate.

> **Interviewers Watch For**
>
> The interviewer is specifically checking that you use `IS NULL`, not `= NULL`. This is a classic gotcha.

> **Common Pitfall**
>
> `WHERE avg_val = NULL` always returns zero rows because any comparison with NULL yields NULL, not TRUE. Always use `IS NULL`.

---

## Common follow-up questions

- What is the difference between IS NULL and = NULL? _(Tests three-valued logic (TRUE, FALSE, NULL).)_
- How would you also find features where null_pct exceeds 50%? _(Tests combining IS NULL with a numeric filter using OR.)_
- How would you replace NULL avg_val with a default in output? _(Tests COALESCE usage.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/features_with_missing_values)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.