NULL Values: Advanced
At Snowflake, data engineering teams run analytical pipelines that join dozens of tables across petabytes of data, and a NULL that enters a window function partition or propagates through a UNION can silently corrupt downstream metrics that drive business decisions for thousands of enterprise customers. A misplaced NULLS FIRST clause in a ranking query, or a NULL that slips through three-valued logic in a CASE expression, can make revenue figures appear to zero out or cause cohort rankings to shuffle in ways that are nearly impossible to trace. These are the bugs that make it into postmortems. This lesson builds the mental model for exactly how NULL flows through advanced SQL operations so you can reason about it before it reaches production.
NULLs in window functions
Handle gaps in ordered calculations
Window functions have specific behaviors with NULL values that affect sorting, partitioning, and calculations. Understanding these behaviors is critical for accurate analytics.
Sorting and Ordering
Window functions frequently use ORDER BY and PARTITION BY clauses where NULL values create specific behaviors you need to understand.
NULLS FIRST/LAST
When using ORDER BY within window functions, NULL values have a default sort position that varies by database. You can control placement explicitly with NULLS FIRST or NULLS LAST:
With NULLS LAST, employees with unknown salaries appear at the end of the ranking. Without this clause, their position depends on database defaults.
NULL in PARTITION BY
When partitioning, all NULL values in the partition column are grouped together into a single partition:
E-104 and E-105 (both with NULL department) are ranked together in their own partition. This might be useful for ranking unassigned employees, but it's often unexpected.
Calculations and Navigation
Window functions that perform calculations or navigate between rows handle NULLs consistently with their non-window counterparts, but the results can still surprise you.
Window Aggregates + NULL
Window aggregate functions (SUM, AVG, COUNT) ignore NULL values in calculations, just like regular aggregates.
The moving average for 2024-03 is (100 + 150) / 2 = 125, not (100 + 0 + 150) / 3 = 83.33. The NULL in February is excluded from both the sum and the count.
LAG/LEAD with NULL
LAG and LEAD can return NULL in two situations: when reaching the edge of the partition, or when the referenced value is actually NULL:
COALESCE around LAG to handle both edge cases and actual NULL values: COALESCE(LAG(revenue, 1) OVER (...), 0)Compare what happens when LAG hits a NULL value versus wrapping it with COALESCE:
> Complete the ranking query that puts NULL salaries at the end.
SELECT metric_id, RANK() OVER ( ORDER BY ) AS rank FROM employee_metrics
Specifying NULLS FIRST or NULLS LAST explicitly is one of those small habits that prevents an entire class of subtle ranking bugs that only surface when production data contains incomplete records.
In window function partitioning, the fact that all NULLs land in one shared partition is usually unintended behavior, so checking whether your partition column contains NULLs before deploying a windowed query is a worthwhile precaution.
Wrapping LAG and LEAD with COALESCE is a defensive practice that handles both the partition-edge case and the case where the referenced value is genuinely missing, preventing NULL from propagating into downstream calculations.
NULLs in UNION operations
Merge result sets with missing values
UNION and UNION ALL treat NULL values differently when deduplicating rows. Understanding this behavior is crucial for combining datasets correctly.
UNION Behavior
UNION operations use a special comparison for deduplication that differs from standard equality comparisons.
UNION Deduplication
UNION (without ALL) removes duplicate rows. For deduplication purposes, NULL values in the same column position are considered equal:
If both tables have a row ('Charlie', NULL), UNION returns only one copy. The deduplication treats these NULLs as matching, even though NULL = NULL is UNKNOWN in comparisons.
JOIN behavior! In JOINs, NULL never equals NULL. In UNION deduplication, NULLs in the same position are treated as equal.UNION ALL Duplicates
UNION ALL preserves all rows including duplicates, which includes rows with NULL values.
UNION ALL keeps both copies of ('Charlie', NULL). Use it when you want to preserve all rows including duplicates with NULL values.
Which version keeps duplicate NULL rows? Build the query:
> Complete the query to combine both tables while keeping all rows, including duplicate NULLs.
SELECT region FROM orders SELECT region FROM cloud_costs
The UNION deduplication behavior for NULLs is one of the few places in SQL where NULLs are treated as equal to each other, which is the opposite of how they behave in WHERE and JOIN conditions.
Choosing between UNION and UNION ALL is a correctness decision, not just a performance one: if you use UNION when you actually need UNION ALL, you will silently lose rows that happen to be identical including their NULL values.
In most data pipeline contexts, UNION ALL is the safer default because it preserves all rows without the overhead of deduplication, and you can always add a subsequent DISTINCT or GROUP BY if deduplication is truly needed.
DISTINCT and NULL
Deduplicate when NULLs are present
SELECT DISTINCT treats all NULL values as equal, returning only one NULL in the result. This differs from WHERE and JOIN behavior.
DISTINCT Grouping Behavior
Unlike comparisons in WHERE clauses, DISTINCT considers all NULL values to be the same.
NULL Collapsing
DISTINCT and GROUP BY treat all NULLs as equal, collapsing them into a single group.
Even if 10 employees have NULL department, DISTINCT returns only one NULL row.
Context Comparison
Different SQL contexts treat NULL equality differently. This table summarizes the key differences.
NULL Behavior by Context
SQL has inconsistent rules for NULL equality depending on the operation being performed.
WHERE: NULL = NULL is UNKNOWN, so NULLs don't matchJOINcondition: NULL = NULL is UNKNOWN, NULLs don't matchDISTINCT: NULLs treated as equal, one NULL returnedGROUP BY: NULLs treated as equal, grouped togetherUNIONdedup: NULLs treated as equal, deduplicated
Does COUNT(DISTINCT col) include NULL as a distinct value? Compare the approaches:
> Complete the query to return unique regions with NULLs collapsed to one.
SELECT FROM orders
Understanding that DISTINCT collapses all NULLs into one row is essential for writing accurate distinct-count queries, because COUNT(DISTINCT col) will exclude NULLs entirely even though SELECT DISTINCT would include one NULL row.
The inconsistency between WHERE behavior (NULL never equals NULL) and DISTINCT/GROUP BY behavior (NULLs are equal) is one of the most confusing aspects of SQL, but it reflects a deliberate design decision to make grouping operations predictable.
When writing queries that combine DISTINCT with nullable columns, always verify which context you are in and whether the NULL handling matches the business requirement you are trying to satisfy.
NULL propagation
Trace how one NULL spreads through expressions
NULL is "contagious" in SQL: most operations involving NULL produce NULL. Understanding propagation rules helps you predict query results and avoid subtle bugs.
Basic Propagation Rules
Arithmetic and string operations follow consistent rules: if any operand is NULL, the result is NULL.
Arithmetic Propagation
Any arithmetic operation with NULL yields NULL, regardless of the other operand:
Even though P-002 has a valid price, the NULL discount causes the entire calculation to return NULL.
String Concatenation
String concatenation with NULL also produces NULL in standard SQL:
Jane's full name becomes NULL because her middle name is NULL. Use COALESCE to provide a default value (like an empty string) for any NULL components, preventing the entire concatenation from collapsing.
How can you prevent NULL from breaking an arithmetic expression? Toggle to compare:
Control Flow Propagation
CASE expressions and functions have more complex NULL handling that depends on which parts of the expression contain NULLs.
CASE Propagation
CASE statements have nuanced NULL behavior. The WHEN condition uses standard comparison (where NULL = anything is UNKNOWN), but THEN/ELSE can return NULL:
We use the searched CASE form with IS NULL to detect NULL. The simple form (CASE status WHEN NULL) would never match because status = NULL evaluates to UNKNOWN.
Function Propagation
Most scalar functions propagate NULL, but aggregate functions ignore it:
- UPPER(NULL) returns NULL. Scalar functions propagate NULL
- LENGTH(NULL) returns NULL. Any NULL input returns NULL
- ROUND(NULL, 2) returns NULL. Even with a valid second argument
- SUM(col): Ignores NULLs, returns sum of non-NULL values
- COUNT(*): Counts all rows regardless of NULLs
- COUNT(col): Ignores NULLs, counts only non-NULL values
> Complete this query to safely calculate a total even when discount may be NULL.
SELECT order_id, price - (discount, ) AS final_price FROM orders
Any arithmetic operation with NULL produces NULL. This propagation is the most common source of unexpected NULLs in query results.
Scalar functions like UPPER, LENGTH, and ROUND also propagate NULL. Only aggregate functions like SUM and COUNT handle NULLs by ignoring them.
When debugging unexpected NULLs, check each column in your expression for potential NULL values and add COALESCE where needed.
Three-valued logic (NOT IN)
Avoid the NOT IN trap with NULL values
Three-Valued Logic
Truth Tables with UNKNOWN
When NULL is involved in comparisons, the result is UNKNOWN. Here's how AND and OR behave:
- TRUE AND UNKNOWN → UNKNOWN (could be TRUE or FALSE)
- FALSE AND UNKNOWN → FALSE (FALSE AND anything is FALSE)
- TRUE OR UNKNOWN → TRUE (TRUE OR anything is TRUE)
- FALSE OR UNKNOWN → UNKNOWN (could be TRUE or FALSE)
- UNKNOWN AND UNKNOWN → UNKNOWN
- UNKNOWN OR UNKNOWN → UNKNOWN
Key insight: TRUE AND UNKNOWN = UNKNOWN (not TRUE), but FALSE AND UNKNOWN = FALSE. Similarly, TRUE OR UNKNOWN = TRUE, but FALSE OR UNKNOWN = UNKNOWN.
Common Pitfalls
The NOT IN Trap
One of the most dangerous NULL pitfalls is using NOT IN with a subquery that might contain NULL:
If cost_allocs contains even ONE NULL department_id, this query returns NO ROWS. NOT IN expands to: dept_id != 1 AND dept_id != 2 AND dept_id != NULL.
Since anything != NULL is UNKNOWN, the entire AND chain becomes UNKNOWN, and no rows pass the filter.
NOT EXISTS with a correlated subquery instead of NOT IN when NULLs might be present.De Morgan's Laws Break Down
In classical logic, NOT (A AND B) = (NOT A) OR (NOT B). But with three-valued logic, these queries are NOT equivalent when columns can be NULL:
If status is NULL, Query 1 might include a row while Query 2 excludes it. The != comparison with NULL yields UNKNOWN, changing how rows are filtered.
The Excluded Middle
In classical logic, A OR NOT A is always TRUE. In SQL, this fails when A involves NULL:
For E-003: status = 'active' is UNKNOWN, and status != 'active' is also UNKNOWN. UNKNOWN OR UNKNOWN = UNKNOWN, so the ELSE branch executes.
- Always specify NULLS FIRST or NULLS LAST in window ORDER BY
- Use COALESCE with LAG/LEAD for both edge cases and NULLs
- Use NOT EXISTS instead of NOT IN when subqueries might have NULLs
- Explicitly handle NULL in CASE with IS NULL checks
- Don't use NOT IN with subqueries that might return NULL
- Don't write CASE column WHEN NULL (it never matches)
- Don't assume De Morgan's laws work with nullable columns
- Don't forget that window aggregates ignore NULLs
The claims pipeline feeds three downstream systems: a financial ledger that sums settlement amounts, a regulatory report that counts claims by diagnosis, and a workload dashboard that assigns claims to adjusters. Each system breaks differently when NULLs propagate through.
| claim_id | diagnosis_code | adjuster_id | settlement_amt | status |
|---|---|---|---|---|
| CLM-001 | J18.9 | ADJ-12 | 4500.00 | settled |
| CLM-002 | NULL | ADJ-12 | 1200.00 | pending |
| CLM-003 | K21.0 | NULL | NULL | open |
| code | description | category |
|---|---|---|
| J18.9 | Pneumonia | respiratory |
| K21.0 | GERD | digestive |
The financial ledger uses SUM(settlement_amt) to calculate total payouts. With NULLs in the column, SUM silently skips those rows. The reported total is $2.1M lower than the actual liability. The CFO wants accurate numbers. How do you handle NULLs in the settlement pipeline?
SELECT SUM(settlement_amt) AS total FROM claims WHERE status = 'settled' -- Returns: $18.4M -- Actual liability: $20.5M -- Missing: $2.1M in NULL rows
The NOT IN trap is one of the most dangerous pitfalls with NULL because it silently returns zero rows rather than throwing an error.
Three-valued logic means you can never substitute != NULL for IS NOT NULL: the comparison always returns UNKNOWN, excluding the row from any WHERE filter.
Developing a habit of reviewing whether subqueries in NOT IN can return NULL before using them in production is one of the most valuable defensive practices.
> Complete the filter to check for non-NULL status before filtering on profit.
SELECT * FROM orders WHERE profit > 0
The NOT IN trap is one of the most dangerous NULL pitfalls in SQL because it silently returns zero rows rather than throwing an error, making it extremely difficult to debug without specifically knowing to look for NULLs in the subquery.
Three-valued logic is the reason why you can never substitute != NULL for IS NOT NULL: the comparison always returns UNKNOWN, which means the row will be excluded from any WHERE filter regardless of the actual value.
Developing a habit of reviewing whether subqueries in NOT IN or comparison conditions can return NULL before using them in production is one of the most valuable defensive practices for writing reliable SQL.
> You are a data engineer at Experian building a customer-360 view by merging incomplete records from three source systems, each using different NULL conventions including NULLS FIRST ordering, DISTINCT deduplication, and NOT IN subquery filters.
NULLS FIRST and NULLS LAST control where missing identifiers land in window function rankings across sources.UNION deduplicates NULL records so the same unknown customer does not appear multiple times in the merged view.DISTINCT and NULL grouping means all unknown customer IDs collapse into one representative row per source.NOT IN subqueries silently return zero rows when the exclusion list contains NULL.NULLS FIRST or NULLS LAST explicitly for consistent sort behavior across databasesPARTITION BY, all NULLs group into a single partition (they're treated as equal)DISTINCT treats all NULLs as equal; only one NULL appears in resultsUNION eliminates duplicate NULLs; UNION ALL preserves themNULL comparisons return UNKNOWN, not FALSENOT IN with NULLs can return zero rows: WHERE x NOT IN (1, NULL) filters out everythingNULL propagates: any arithmetic or string operation with NULL returns NULLCASE WHEN col = NULL never matches; use CASE WHEN col IS NULL insteadTRUE, FALSE, and NULL walk into a bar
- Category
- SQL
- Difficulty
- advanced
- Duration
- 26 minutes
- Challenges
- 0 hands-on challenges
Topics covered: NULLs in window functions, NULLs in UNION operations, DISTINCT and NULL, NULL propagation, Three-valued logic (NOT IN)
Lesson Sections
- NULLs in window functions
Sorting and Ordering NULLS FIRST/LAST NULL in PARTITION BY Calculations and Navigation Window Aggregates + NULL LAG/LEAD with NULL
- NULLs in UNION operations
UNION Behavior UNION Deduplication UNION ALL Duplicates
- DISTINCT and NULL (concepts: sqlDistinct)
DISTINCT Grouping Behavior NULL Collapsing Context Comparison NULL Behavior by Context
- NULL propagation
Basic Propagation Rules Arithmetic Propagation String Concatenation Control Flow Propagation CASE Propagation Function Propagation
- Three-valued logic (NOT IN)
SQL uses three-valued logic (TRUE, FALSE, UNKNOWN) instead of classical boolean logic. This fundamentally changes how compound conditions behave and breaks some intuitive rules. Three-Valued Logic Boolean operations in SQL don't just have TRUE and FALSE outcomes; UNKNOWN is a third possibility that propagates through logical operators. Truth Tables with UNKNOWN Common Pitfalls Three-valued logic creates counterintuitive behaviors that cause subtle bugs even for experienced developers. The NOT IN