NULL Values: Advanced

At Snowflake, data engineering teams run analytical pipelines that join dozens of tables across petabytes of data, and a NULL that enters a window function partition or propagates through a UNION can silently corrupt downstream metrics that drive business decisions for thousands of enterprise customers. A misplaced NULLS FIRST clause in a ranking query, or a NULL that slips through three-valued logic in a CASE expression, can make revenue figures appear to zero out or cause cohort rankings to shuffle in ways that are nearly impossible to trace. These are the bugs that make it into postmortems. This lesson builds the mental model for exactly how NULL flows through advanced SQL operations so you can reason about it before it reaches production.

NULLs in window functions

Daily Life

Interviews

Handle gaps in ordered calculations

Window functions have specific behaviors with NULL values that affect sorting, partitioning, and calculations. Understanding these behaviors is critical for accurate analytics.

Sorting and Ordering

Window functions frequently use ORDER BY and PARTITION BY clauses where NULL values create specific behaviors you need to understand.

NULLS FIRST/LAST

When using ORDER BY within window functions, NULL values have a default sort position that varies by database. You can control placement explicitly with NULLS FIRST or NULLS LAST:

	SELECT
	employee_id,
	salary,
	ROW_NUMBER() OVER (
	ORDER BY salary DESC NULLS LAST
	) AS salary_rank
	FROM employee_metrics

Result

employee_id	salary	salary_rank
E-103	95000	1
E-101	75000	2
E-104	65000	3
E-102	NULL	4
E-105	NULL	5

With NULLS LAST, employees with unknown salaries appear at the end of the ranking. Without this clause, their position depends on database defaults.

NULL in PARTITION BY

When partitioning, all NULL values in the partition column are grouped together into a single partition:

	SELECT
	employee_id,
	department,
	salary,
	RANK() OVER (
	PARTITION BY department
	ORDER BY salary DESC
	) AS dept_rank
	FROM employee_metrics

Result

employee_id	department	salary	dept_rank
E-101	Engineering	95000	1
E-102	Engineering	85000	2
E-103	Sales	75000	1
E-104	NULL	70000	1
E-105	NULL	65000	2

E-104 and E-105 (both with NULL department) are ranked together in their own partition. This might be useful for ranking unassigned employees, but it's often unexpected.

Calculations and Navigation

Window functions that perform calculations or navigate between rows handle NULLs consistently with their non-window counterparts, but the results can still surprise you.

Window Aggregates + NULL

Window aggregate functions (SUM, AVG, COUNT) ignore NULL values in calculations, just like regular aggregates.

	SELECT
	month,
	revenue,
	AVG(revenue) OVER (
	ORDER BY month
	ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
	) AS moving_avg
	FROM monthly_sales

Result

month	revenue	moving_avg
2024-01	100	100.00
2024-02	NULL	100.00
2024-03	150	125.00
2024-04	200	175.00

The moving average for 2024-03 is (100 + 150) / 2 = 125, not (100 + 0 + 150) / 3 = 83.33. The NULL in February is excluded from both the sum and the count.

LAG/LEAD with NULL

LAG and LEAD can return NULL in two situations: when reaching the edge of the partition, or when the referenced value is actually NULL:

	SELECT
	month,
	revenue,
	LAG(revenue, 1, 0) OVER (
	ORDER BY month
	) AS prev_revenue,
	revenue - LAG(revenue, 1, 0) OVER (
	ORDER BY month
	) AS change
	FROM monthly_sales

Result

month	revenue	prev_revenue	change
2024-01	100	0	100
2024-02	NULL	100	NULL
2024-03	150	NULL	NULL

TIP

Use COALESCE around LAG to handle both edge cases and actual NULL values: COALESCE(LAG(revenue, 1) OVER (...), 0)

Compare what happens when LAG hits a NULL value versus wrapping it with COALESCE:

Without COALESCE, LAG returns NULL both at the partition edge (row 1) and when the referenced value is NULL (row 3).

	SELECT
	month,
	revenue,
	LAG(revenue, 1) OVER (
	ORDER BY month
	) AS prev_revenue
	FROM monthly_sales

> Complete the ranking query that puts NULL salaries at the end.

SELECT
  metric_id,
  RANK() OVER (
    ORDER BY ___ ___ ___
  ) AS rank
FROM employee_metrics

NULLS LAST

DESC

metric_value

NULLS FIRST

ASC

Specifying NULLS FIRST or NULLS LAST explicitly is one of those small habits that prevents an entire class of subtle ranking bugs that only surface when production data contains incomplete records.

In window function partitioning, the fact that all NULLs land in one shared partition is usually unintended behavior, so checking whether your partition column contains NULLs before deploying a windowed query is a worthwhile precaution.

Wrapping LAG and LEAD with COALESCE is a defensive practice that handles both the partition-edge case and the case where the referenced value is genuinely missing, preventing NULL from propagating into downstream calculations.

NULLs in UNION operations

Daily Life

Interviews

Merge result sets with missing values

UNION and UNION ALL treat NULL values differently when deduplicating rows. Understanding this behavior is crucial for combining datasets correctly.

UNION Behavior

UNION operations use a special comparison for deduplication that differs from standard equality comparisons.

UNION Deduplication

UNION (without ALL) removes duplicate rows. For deduplication purposes, NULL values in the same column position are considered equal:

	SELECT
	name,
	department
	FROM orders

	UNION

	SELECT
	name,
	department
	FROM cloud_costs

Result

name	department
Alice	Engineering
Bob	Sales
Charlie	NULL

If both tables have a row ('Charlie', NULL), UNION returns only one copy. The deduplication treats these NULLs as matching, even though NULL = NULL is UNKNOWN in comparisons.

TIP

This is different from JOIN behavior! In JOINs, NULL never equals NULL. In UNION deduplication, NULLs in the same position are treated as equal.

UNION ALL Duplicates

UNION ALL preserves all rows including duplicates, which includes rows with NULL values.

	SELECT
	name,
	department
	FROM orders

	UNION ALL

	SELECT
	name,
	department
	FROM cloud_costs

Result

name	department
Alice	Engineering
Charlie	NULL
Bob	Sales
Charlie	NULL

UNION ALL keeps both copies of ('Charlie', NULL). Use it when you want to preserve all rows including duplicates with NULL values.

Which version keeps duplicate NULL rows? Build the query:

UNION deduplicates rows, treating NULLs as equal. Only one NULL row appears.

	SELECT
	region
	FROM orders

	UNION

	SELECT
	region
	FROM cloud_costs

> Complete the query to combine both tables while keeping all rows, including duplicate NULLs.

SELECT
  region
FROM orders

___ ___

SELECT
  region
FROM cloud_costs

UNION

ALL

DISTINCT

EXCEPT

The UNION deduplication behavior for NULLs is one of the few places in SQL where NULLs are treated as equal to each other, which is the opposite of how they behave in WHERE and JOIN conditions.

Choosing between UNION and UNION ALL is a correctness decision, not just a performance one: if you use UNION when you actually need UNION ALL, you will silently lose rows that happen to be identical including their NULL values.

In most data pipeline contexts, UNION ALL is the safer default because it preserves all rows without the overhead of deduplication, and you can always add a subsequent DISTINCT or GROUP BY if deduplication is truly needed.

DISTINCT and NULL

Daily Life

Interviews

Deduplicate when NULLs are present

SELECT DISTINCT treats all NULL values as equal, returning only one NULL in the result. This differs from WHERE and JOIN behavior.

DISTINCT Grouping Behavior

Unlike comparisons in WHERE clauses, DISTINCT considers all NULL values to be the same.

NULL Collapsing

DISTINCT and GROUP BY treat all NULLs as equal, collapsing them into a single group.

	SELECT DISTINCT
	department
	FROM employee_metrics

Result

department
Engineering
Sales
Marketing
NULL

Even if 10 employees have NULL department, DISTINCT returns only one NULL row.

Context Comparison

Different SQL contexts treat NULL equality differently. This table summarizes the key differences.

NULL Behavior by Context

SQL has inconsistent rules for NULL equality depending on the operation being performed.

NULL Grouping Behavior by Context

WHERE: NULL = NULL is UNKNOWN, so NULLs don't match
JOIN condition: NULL = NULL is UNKNOWN, NULLs don't match
DISTINCT: NULLs treated as equal, one NULL returned
GROUP BY: NULLs treated as equal, grouped together
UNION dedup: NULLs treated as equal, deduplicated

Does COUNT(DISTINCT col) include NULL as a distinct value? Compare the approaches:

COUNT(DISTINCT) excludes NULL from the count. Three non-NULL regions are counted.

	SELECT
	COUNT(DISTINCT region) AS region_count
	FROM orders

> Complete the query to return unique regions with NULLs collapsed to one.

SELECT ___
  ___
FROM orders

DISTINCT

region

UNIQUE

ALL

Understanding that DISTINCT collapses all NULLs into one row is essential for writing accurate distinct-count queries, because COUNT(DISTINCT col) will exclude NULLs entirely even though SELECT DISTINCT would include one NULL row.

The inconsistency between WHERE behavior (NULL never equals NULL) and DISTINCT/GROUP BY behavior (NULLs are equal) is one of the most confusing aspects of SQL, but it reflects a deliberate design decision to make grouping operations predictable.

When writing queries that combine DISTINCT with nullable columns, always verify which context you are in and whether the NULL handling matches the business requirement you are trying to satisfy.

NULL propagation

Daily Life

Interviews

Trace how one NULL spreads through expressions

NULL is "contagious" in SQL: most operations involving NULL produce NULL. Understanding propagation rules helps you predict query results and avoid subtle bugs.

Basic Propagation Rules

Arithmetic and string operations follow consistent rules: if any operand is NULL, the result is NULL.

Arithmetic Propagation

Any arithmetic operation with NULL yields NULL, regardless of the other operand:

	SELECT
	product_id,
	price,
	discount,
	price * (
	1 - discount
	) AS final_price
	FROM products

Result

product_id	price	discount	final_price
P-001	100	0.10	90.00
P-002	200	NULL	NULL
P-003	NULL	0.15	NULL

Even though P-002 has a valid price, the NULL discount causes the entire calculation to return NULL.

String Concatenation

String concatenation with NULL also produces NULL in standard SQL:

	SELECT
	first_name,
	middle_name,
	last_name,
	first_name \|\| ' ' \|\| middle_name \|\| ' ' \|\| last_name AS full_name
	FROM customers

Result

first_name	middle_name	last_name	full_name
John	Michael	Smith	John Michael Smith
Jane	NULL	Doe	NULL

Jane's full name becomes NULL because her middle name is NULL. Use COALESCE to provide a default value (like an empty string) for any NULL components, preventing the entire concatenation from collapsing.

How can you prevent NULL from breaking an arithmetic expression? Toggle to compare:

Without COALESCE, any NULL operand causes the entire expression to return NULL.

	SELECT
	product_id,
	price * (
	1 - discount
	) AS final_price
	FROM products

Control Flow Propagation

CASE expressions and functions have more complex NULL handling that depends on which parts of the expression contain NULLs.

CASE Propagation

CASE statements have nuanced NULL behavior. The WHEN condition uses standard comparison (where NULL = anything is UNKNOWN), but THEN/ELSE can return NULL:

	SELECT
	status,
	CASE
	WHEN status = 'active' THEN 'A'
	WHEN status = 'pending' THEN 'P'
	WHEN status IS NULL THEN 'N'
	ELSE 'X'
	END AS code
	FROM accounts

Result

status	code
active	A
pending	P
NULL	N

We use the searched CASE form with IS NULL to detect NULL. The simple form (CASE status WHEN NULL) would never match because status = NULL evaluates to UNKNOWN.

Function Propagation

Most scalar functions propagate NULL, but aggregate functions ignore it:

Function Propagation Rules

UPPER(NULL) returns NULL. Scalar functions propagate NULL
LENGTH(NULL) returns NULL. Any NULL input returns NULL
ROUND(NULL, 2) returns NULL. Even with a valid second argument
SUM(col): Ignores NULLs, returns sum of non-NULL values
COUNT(*): Counts all rows regardless of NULLs
COUNT(col): Ignores NULLs, counts only non-NULL values

> Complete this query to safely calculate a total even when discount may be NULL.

SELECT
  order_id,
  price - ___(discount, ___) AS final_price
FROM orders

COALESCE

NULLIF

NULL

Any arithmetic operation with NULL produces NULL. This propagation is the most common source of unexpected NULLs in query results.

Scalar functions like UPPER, LENGTH, and ROUND also propagate NULL. Only aggregate functions like SUM and COUNT handle NULLs by ignoring them.

When debugging unexpected NULLs, check each column in your expression for potential NULL values and add COALESCE where needed.

Three-valued logic (NOT IN)

Daily Life

Interviews

Avoid the NOT IN trap with NULL values

SQL uses three-valued logic (TRUE, FALSE, UNKNOWN) instead of classical boolean logic. This fundamentally changes how compound conditions behave and breaks some intuitive rules.

Three-Valued Logic

Boolean operations in SQL don't just have TRUE and FALSE outcomes; UNKNOWN is a third possibility that propagates through logical operators.

Truth Tables with UNKNOWN

When NULL is involved in comparisons, the result is UNKNOWN. Here's how AND and OR behave:

Three-Valued Logic Rules

TRUE AND UNKNOWN → UNKNOWN (could be TRUE or FALSE)
FALSE AND UNKNOWN → FALSE (FALSE AND anything is FALSE)
TRUE OR UNKNOWN → TRUE (TRUE OR anything is TRUE)
FALSE OR UNKNOWN → UNKNOWN (could be TRUE or FALSE)
UNKNOWN AND UNKNOWN → UNKNOWN
UNKNOWN OR UNKNOWN → UNKNOWN

Key insight: TRUE AND UNKNOWN = UNKNOWN (not TRUE), but FALSE AND UNKNOWN = FALSE. Similarly, TRUE OR UNKNOWN = TRUE, but FALSE OR UNKNOWN = UNKNOWN.

Common Pitfalls

Three-valued logic creates counterintuitive behaviors that cause subtle bugs even for experienced developers.

The NOT IN Trap

One of the most dangerous NULL pitfalls is using NOT IN with a subquery that might contain NULL:

	SELECT
	name
	FROM employee_metrics
	WHERE department_id NOT IN (
	SELECT
	department_id
	FROM cost_allocs
	)

Result

name
(no rows)

If cost_allocs contains even ONE NULL department_id, this query returns NO ROWS. NOT IN expands to: dept_id != 1 AND dept_id != 2 AND dept_id != NULL.

Since anything != NULL is UNKNOWN, the entire AND chain becomes UNKNOWN, and no rows pass the filter.

TIP

Use NOT EXISTS with a correlated subquery instead of NOT IN when NULLs might be present.

De Morgan's Laws Break Down

In classical logic, NOT (A AND B) = (NOT A) OR (NOT B). But with three-valued logic, these queries are NOT equivalent when columns can be NULL:

Query 1: NOT applied to the whole AND expression. When status is NULL, the AND yields UNKNOWN, NOT(UNKNOWN) is also UNKNOWN, but the row still appears because NULL rows are not explicitly excluded.

	SELECT
	*
	FROM orders
	WHERE NOT (
	status = 'active'
	AND region = 'US'
	)

If status is NULL, Query 1 might include a row while Query 2 excludes it. The != comparison with NULL yields UNKNOWN, changing how rows are filtered.

The Excluded Middle

In classical logic, A OR NOT A is always TRUE. In SQL, this fails when A involves NULL:

	SELECT
	employee_id,
	status,
	CASE
	WHEN status = 'active'
	OR status != 'active' THEN 'One must be true'
	ELSE 'Neither is true'
	END AS logic_check
	FROM employee_metrics

Result

employee_id	status	logic_check
E-001	active	One must be true
E-002	inactive	One must be true
E-003	NULL	Neither is true

For E-003: status = 'active' is UNKNOWN, and status != 'active' is also UNKNOWN. UNKNOWN OR UNKNOWN = UNKNOWN, so the ELSE branch executes.

✓Do

Always specify NULLS FIRST or NULLS LAST in window ORDER BY
Use COALESCE with LAG/LEAD for both edge cases and NULLs
Use NOT EXISTS instead of NOT IN when subqueries might have NULLs
Explicitly handle NULL in CASE with IS NULL checks

✗Don't

Don't use NOT IN with subqueries that might return NULL
Don't write CASE column WHEN NULL (it never matches)
Don't assume De Morgan's laws work with nullable columns
Don't forget that window aggregates ignore NULLs

The NULL DilemmaStep 1

The claims pipeline feeds three downstream systems: a financial ledger that sums settlement amounts, a regulatory report that counts claims by diagnosis, and a workload dashboard that assigns claims to adjusters. Each system breaks differently when NULLs propagate through.

claims

claim_id	diagnosis_code	adjuster_id	settlement_amt	status
CLM-001	J18.9	ADJ-12	4500.00	settled
CLM-002	NULL	ADJ-12	1200.00	pending
CLM-003	K21.0	NULL	NULL	open

diagnosis_codes

code	description	category
J18.9	Pneumonia	respiratory
K21.0	GERD	digestive

Jul 2026

NULL Handling Strategy

The financial ledger uses SUM(settlement_amt) to calculate total payouts. With NULLs in the column, SUM silently skips those rows. The reported total is $2.1M lower than the actual liability. The CFO wants accurate numbers. How do you handle NULLs in the settlement pipeline?

SQL

SELECT SUM(settlement_amt) AS total
FROM claims
WHERE status = 'settled'
-- Returns: $18.4M
-- Actual liability: $20.5M
-- Missing: $2.1M in NULL rows

The NOT IN trap is one of the most dangerous pitfalls with NULL because it silently returns zero rows rather than throwing an error.

Three-valued logic means you can never substitute != NULL for IS NOT NULL: the comparison always returns UNKNOWN, excluding the row from any WHERE filter.

Developing a habit of reviewing whether subqueries in NOT IN can return NULL before using them in production is one of the most valuable defensive practices.

> Complete the filter to check for non-NULL status before filtering on profit.

SELECT
  *
FROM orders
WHERE ___ ___
___ profit > 0

IS NOT NULL

AND

status

!= NULL

The NOT IN trap is one of the most dangerous NULL pitfalls in SQL because it silently returns zero rows rather than throwing an error, making it extremely difficult to debug without specifically knowing to look for NULLs in the subquery.

Three-valued logic is the reason why you can never substitute != NULL for IS NOT NULL: the comparison always returns UNKNOWN, which means the row will be excluded from any WHERE filter regardless of the actual value.

Developing a habit of reviewing whether subqueries in NOT IN or comparison conditions can return NULL before using them in production is one of the most valuable defensive practices for writing reliable SQL.

❯❯❯PUTTING IT ALL TOGETHER

> You are a data engineer at Experian building a customer-360 view by merging incomplete records from three source systems, each using different NULL conventions including NULLS FIRST ordering, DISTINCT deduplication, and NOT IN subquery filters.

NULLS FIRST and NULLS LAST control where missing identifiers land in window function rankings across sources.

UNION deduplicates NULL records so the same unknown customer does not appear multiple times in the merged view.

DISTINCT and NULL grouping means all unknown customer IDs collapse into one representative row per source.

Three-valued logic explains why NOT IN subqueries silently return zero rows when the exclusion list contains NULL.

KEY TAKEAWAYS

Use NULLS FIRST or NULLS LAST explicitly for consistent sort behavior across databases

In PARTITION BY, all NULLs group into a single partition (they're treated as equal)

DISTINCT treats all NULLs as equal; only one NULL appears in results

UNION eliminates duplicate NULLs; UNION ALL preserves them

Three-valued logic: NULL comparisons return UNKNOWN, not FALSE

NOT IN with NULLs can return zero rows: WHERE x NOT IN (1, NULL) filters out everything

NULL propagates: any arithmetic or string operation with NULL returns NULL

CASE WHEN col = NULL never matches; use CASE WHEN col IS NULL instead

NULL Values: Advanced

NULLs in window functions

Sorting and Ordering

NULLS FIRST/LAST

NULL in PARTITION BY

Calculations and Navigation

Window Aggregates + NULL

LAG/LEAD with NULL

NULLs in UNION operations

UNION Behavior

UNION Deduplication

UNION ALL Duplicates

DISTINCT and NULL

DISTINCT Grouping Behavior

NULL Collapsing

Context Comparison

NULL Behavior by Context

NULL propagation

Basic Propagation Rules

Arithmetic Propagation

String Concatenation

Control Flow Propagation

CASE Propagation

Function Propagation

Three-valued logic (NOT IN)

Three-Valued Logic

Truth Tables with UNKNOWN

Common Pitfalls

The NOT IN Trap

De Morgan's Laws Break Down

The Excluded Middle

Lesson Sections