NULL Values: Advanced

At Snowflake, data engineering teams run analytical pipelines that join dozens of tables across petabytes of data, and a NULL that enters a window function partition or propagates through a UNION can silently corrupt downstream metrics that drive business decisions for thousands of enterprise customers. A misplaced NULLS FIRST clause in a ranking query, or a NULL that slips through three-valued logic in a CASE expression, can make revenue figures appear to zero out or cause cohort rankings to shuffle in ways that are nearly impossible to trace. These are the bugs that make it into postmortems. This lesson builds the mental model for exactly how NULL flows through advanced SQL operations so you can reason about it before it reaches production.

NULLs in window functions

Daily Life
Interviews

Handle gaps in ordered calculations

Window functions have specific behaviors with NULL values that affect sorting, partitioning, and calculations. Understanding these behaviors is critical for accurate analytics.

Sorting and Ordering

Window functions frequently use ORDER BY and PARTITION BY clauses where NULL values create specific behaviors you need to understand.

//

NULLS FIRST/LAST

When using ORDER BY within window functions, NULL values have a default sort position that varies by database. You can control placement explicitly with NULLS FIRST or NULLS LAST:

1SELECT
2 employee_id,
3 salary,
4 ROW_NUMBER() OVER (
5 ORDER BY salary DESC NULLS LAST
6 ) AS salary_rank
7FROM employee_metrics
Result
employee_idsalarysalary_rank
E-103950001
E-101750002
E-104650003
E-102NULL4
E-105NULL5

With NULLS LAST, employees with unknown salaries appear at the end of the ranking. Without this clause, their position depends on database defaults.

//

NULL in PARTITION BY

When partitioning, all NULL values in the partition column are grouped together into a single partition:

1SELECT
2 employee_id,
3 department,
4 salary,
5 RANK() OVER (
6 PARTITION BY department
7 ORDER BY salary DESC
8 ) AS dept_rank
9FROM employee_metrics
Result
employee_iddepartmentsalarydept_rank
E-101Engineering950001
E-102Engineering850002
E-103Sales750001
E-104NULL700001
E-105NULL650002

E-104 and E-105 (both with NULL department) are ranked together in their own partition. This might be useful for ranking unassigned employees, but it's often unexpected.

Calculations and Navigation

Window functions that perform calculations or navigate between rows handle NULLs consistently with their non-window counterparts, but the results can still surprise you.

//

Window Aggregates + NULL

Window aggregate functions (SUM, AVG, COUNT) ignore NULL values in calculations, just like regular aggregates.

1SELECT
2 month,
3 revenue,
4 AVG(revenue) OVER (
5 ORDER BY month
6 ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
7 ) AS moving_avg
8FROM monthly_sales
Result
monthrevenuemoving_avg
2024-01100100.00
2024-02NULL100.00
2024-03150125.00
2024-04200175.00

The moving average for 2024-03 is (100 + 150) / 2 = 125, not (100 + 0 + 150) / 3 = 83.33. The NULL in February is excluded from both the sum and the count.

//

LAG/LEAD with NULL

LAG and LEAD can return NULL in two situations: when reaching the edge of the partition, or when the referenced value is actually NULL:

1SELECT
2 month,
3 revenue,
4 LAG(revenue, 1, 0) OVER (
5 ORDER BY month
6 ) AS prev_revenue,
7 revenue - LAG(revenue, 1, 0) OVER (
8 ORDER BY month
9 ) AS change
10FROM monthly_sales
Result
monthrevenueprev_revenuechange
2024-011000100
2024-02NULL100NULL
2024-03150NULLNULL
TIP
Use COALESCE around LAG to handle both edge cases and actual NULL values: COALESCE(LAG(revenue, 1) OVER (...), 0)

Compare what happens when LAG hits a NULL value versus wrapping it with COALESCE:

Without COALESCE, LAG returns NULL both at the partition edge (row 1) and when the referenced value is NULL (row 3).
1SELECT
2 month,
3 revenue,
4 LAG(revenue, 1) OVER (
5 ORDER BY month
6 ) AS prev_revenue
7FROM monthly_sales

> Complete the ranking query that puts NULL salaries at the end.

SELECT
  metric_id,
  RANK() OVER (
    ORDER BY ___ ___ ___
  ) AS rank
FROM employee_metrics
NULLS FIRST
NULLS LAST
DESC
metric_value
ASC

Specifying NULLS FIRST or NULLS LAST explicitly is one of those small habits that prevents an entire class of subtle ranking bugs that only surface when production data contains incomplete records.

In window function partitioning, the fact that all NULLs land in one shared partition is usually unintended behavior, so checking whether your partition column contains NULLs before deploying a windowed query is a worthwhile precaution.

Wrapping LAG and LEAD with COALESCE is a defensive practice that handles both the partition-edge case and the case where the referenced value is genuinely missing, preventing NULL from propagating into downstream calculations.

NULLs in UNION operations

Daily Life
Interviews

Merge result sets with missing values

UNION and UNION ALL treat NULL values differently when deduplicating rows. Understanding this behavior is crucial for combining datasets correctly.

UNION Behavior

UNION operations use a special comparison for deduplication that differs from standard equality comparisons.

//

UNION Deduplication

UNION (without ALL) removes duplicate rows. For deduplication purposes, NULL values in the same column position are considered equal:

1SELECT
2 name,
3 department
4FROM orders
5
6UNION
7
8SELECT
9 name,
10 department
11FROM cloud_costs
Result
namedepartment
AliceEngineering
BobSales
CharlieNULL

If both tables have a row ('Charlie', NULL), UNION returns only one copy. The deduplication treats these NULLs as matching, even though NULL = NULL is UNKNOWN in comparisons.

TIP
This is different from JOIN behavior! In JOINs, NULL never equals NULL. In UNION deduplication, NULLs in the same position are treated as equal.
//

UNION ALL Duplicates

UNION ALL preserves all rows including duplicates, which includes rows with NULL values.

1SELECT
2 name,
3 department
4FROM orders
5
6UNION ALL
7
8SELECT
9 name,
10 department
11FROM cloud_costs
Result
namedepartment
AliceEngineering
CharlieNULL
BobSales
CharlieNULL

UNION ALL keeps both copies of ('Charlie', NULL). Use it when you want to preserve all rows including duplicates with NULL values.

Which version keeps duplicate NULL rows? Build the query:

UNION deduplicates rows, treating NULLs as equal. Only one NULL row appears.
1SELECT
2 region
3FROM orders
4
5UNION
6
7SELECT
8 region
9FROM cloud_costs

> Complete the query to combine both tables while keeping all rows, including duplicate NULLs.

SELECT
  region
FROM orders

___ ___

SELECT
  region
FROM cloud_costs
ALL
DISTINCT
EXCEPT
UNION

The UNION deduplication behavior for NULLs is one of the few places in SQL where NULLs are treated as equal to each other, which is the opposite of how they behave in WHERE and JOIN conditions.

Choosing between UNION and UNION ALL is a correctness decision, not just a performance one: if you use UNION when you actually need UNION ALL, you will silently lose rows that happen to be identical including their NULL values.

In most data pipeline contexts, UNION ALL is the safer default because it preserves all rows without the overhead of deduplication, and you can always add a subsequent DISTINCT or GROUP BY if deduplication is truly needed.

DISTINCT and NULL

Daily Life
Interviews

Deduplicate when NULLs are present

SELECT DISTINCT treats all NULL values as equal, returning only one NULL in the result. This differs from WHERE and JOIN behavior.

DISTINCT Grouping Behavior

Unlike comparisons in WHERE clauses, DISTINCT considers all NULL values to be the same.

//

NULL Collapsing

DISTINCT and GROUP BY treat all NULLs as equal, collapsing them into a single group.

1SELECT DISTINCT
2 department
3FROM employee_metrics
Result
department
Engineering
Sales
Marketing
NULL

Even if 10 employees have NULL department, DISTINCT returns only one NULL row.

Context Comparison

Different SQL contexts treat NULL equality differently. This table summarizes the key differences.

//

NULL Behavior by Context

SQL has inconsistent rules for NULL equality depending on the operation being performed.

NULL Grouping Behavior by Context
  • WHERE: NULL = NULL is UNKNOWN, so NULLs don't match
  • JOIN condition: NULL = NULL is UNKNOWN, NULLs don't match
  • DISTINCT: NULLs treated as equal, one NULL returned
  • GROUP BY: NULLs treated as equal, grouped together
  • UNION dedup: NULLs treated as equal, deduplicated

Does COUNT(DISTINCT col) include NULL as a distinct value? Compare the approaches:

COUNT(DISTINCT) excludes NULL from the count. Three non-NULL regions are counted.
1SELECT
2 COUNT(DISTINCT region) AS region_count
3FROM orders

> Complete the query to return unique regions with NULLs collapsed to one.

SELECT ___
  ___
FROM orders
DISTINCT
region
UNIQUE
ALL

Understanding that DISTINCT collapses all NULLs into one row is essential for writing accurate distinct-count queries, because COUNT(DISTINCT col) will exclude NULLs entirely even though SELECT DISTINCT would include one NULL row.

The inconsistency between WHERE behavior (NULL never equals NULL) and DISTINCT/GROUP BY behavior (NULLs are equal) is one of the most confusing aspects of SQL, but it reflects a deliberate design decision to make grouping operations predictable.

When writing queries that combine DISTINCT with nullable columns, always verify which context you are in and whether the NULL handling matches the business requirement you are trying to satisfy.

NULL propagation

Daily Life
Interviews

Trace how one NULL spreads through expressions

NULL is "contagious" in SQL: most operations involving NULL produce NULL. Understanding propagation rules helps you predict query results and avoid subtle bugs.

Basic Propagation Rules

Arithmetic and string operations follow consistent rules: if any operand is NULL, the result is NULL.

//

Arithmetic Propagation

Any arithmetic operation with NULL yields NULL, regardless of the other operand:

1SELECT
2 product_id,
3 price,
4 discount,
5 price * (
6 1 - discount
7 ) AS final_price
8FROM products
Result
product_idpricediscountfinal_price
P-0011000.1090.00
P-002200NULLNULL
P-003NULL0.15NULL

Even though P-002 has a valid price, the NULL discount causes the entire calculation to return NULL.

//

String Concatenation

String concatenation with NULL also produces NULL in standard SQL:

1SELECT
2 first_name,
3 middle_name,
4 last_name,
5 first_name || ' ' || middle_name || ' ' || last_name AS full_name
6FROM customers
Result
first_namemiddle_namelast_namefull_name
JohnMichaelSmithJohn Michael Smith
JaneNULLDoeNULL

Jane's full name becomes NULL because her middle name is NULL. Use COALESCE to provide a default value (like an empty string) for any NULL components, preventing the entire concatenation from collapsing.

How can you prevent NULL from breaking an arithmetic expression? Toggle to compare:

Without COALESCE, any NULL operand causes the entire expression to return NULL.
1SELECT
2 product_id,
3 price * (
4 1 - discount
5 ) AS final_price
6FROM products

Control Flow Propagation

CASE expressions and functions have more complex NULL handling that depends on which parts of the expression contain NULLs.

//

CASE Propagation

CASE statements have nuanced NULL behavior. The WHEN condition uses standard comparison (where NULL = anything is UNKNOWN), but THEN/ELSE can return NULL:

1SELECT
2 status,
3 CASE
4 WHEN status = 'active' THEN 'A'
5 WHEN status = 'pending' THEN 'P'
6 WHEN status IS NULL THEN 'N'
7 ELSE 'X'
8 END AS code
9FROM accounts
Result
statuscode
activeA
pendingP
NULLN

We use the searched CASE form with IS NULL to detect NULL. The simple form (CASE status WHEN NULL) would never match because status = NULL evaluates to UNKNOWN.

//

Function Propagation

Most scalar functions propagate NULL, but aggregate functions ignore it:

Function Propagation Rules
  • UPPER(NULL) returns NULL. Scalar functions propagate NULL
  • LENGTH(NULL) returns NULL. Any NULL input returns NULL
  • ROUND(NULL, 2) returns NULL. Even with a valid second argument
  • SUM(col): Ignores NULLs, returns sum of non-NULL values
  • COUNT(*): Counts all rows regardless of NULLs
  • COUNT(col): Ignores NULLs, counts only non-NULL values

> Complete this query to safely calculate a total even when discount may be NULL.

SELECT
  order_id,
  price - ___(discount, ___) AS final_price
FROM orders
0
NULLIF
COALESCE
NULL

Any arithmetic operation with NULL produces NULL. This propagation is the most common source of unexpected NULLs in query results.

Scalar functions like UPPER, LENGTH, and ROUND also propagate NULL. Only aggregate functions like SUM and COUNT handle NULLs by ignoring them.

When debugging unexpected NULLs, check each column in your expression for potential NULL values and add COALESCE where needed.

Three-valued logic (NOT IN)

Daily Life
Interviews

Avoid the NOT IN trap with NULL values

SQL uses three-valued logic (TRUE, FALSE, UNKNOWN) instead of classical boolean logic. This fundamentally changes how compound conditions behave and breaks some intuitive rules.

Three-Valued Logic

Boolean operations in SQL don't just have TRUE and FALSE outcomes; UNKNOWN is a third possibility that propagates through logical operators.
//

Truth Tables with UNKNOWN

When NULL is involved in comparisons, the result is UNKNOWN. Here's how AND and OR behave:

Three-Valued Logic Rules
  • TRUE AND UNKNOWN → UNKNOWN (could be TRUE or FALSE)
  • FALSE AND UNKNOWN → FALSE (FALSE AND anything is FALSE)
  • TRUE OR UNKNOWN → TRUE (TRUE OR anything is TRUE)
  • FALSE OR UNKNOWN → UNKNOWN (could be TRUE or FALSE)
  • UNKNOWN AND UNKNOWN → UNKNOWN
  • UNKNOWN OR UNKNOWN → UNKNOWN

Key insight: TRUE AND UNKNOWN = UNKNOWN (not TRUE), but FALSE AND UNKNOWN = FALSE. Similarly, TRUE OR UNKNOWN = TRUE, but FALSE OR UNKNOWN = UNKNOWN.

Common Pitfalls

Three-valued logic creates counterintuitive behaviors that cause subtle bugs even for experienced developers.
//

The NOT IN Trap

One of the most dangerous NULL pitfalls is using NOT IN with a subquery that might contain NULL:

1SELECT
2 name
3FROM employee_metrics
4WHERE department_id NOT IN (
5 SELECT
6 department_id
7 FROM cost_allocs
8)
Result
name
(no rows)

If cost_allocs contains even ONE NULL department_id, this query returns NO ROWS. NOT IN expands to: dept_id != 1 AND dept_id != 2 AND dept_id != NULL.

Since anything != NULL is UNKNOWN, the entire AND chain becomes UNKNOWN, and no rows pass the filter.

TIP
Use NOT EXISTS with a correlated subquery instead of NOT IN when NULLs might be present.
//

De Morgan's Laws Break Down

In classical logic, NOT (A AND B) = (NOT A) OR (NOT B). But with three-valued logic, these queries are NOT equivalent when columns can be NULL:

Query 1: NOT applied to the whole AND expression. When status is NULL, the AND yields UNKNOWN, NOT(UNKNOWN) is also UNKNOWN, but the row still appears because NULL rows are not explicitly excluded.
1SELECT
2 *
3FROM orders
4WHERE NOT (
5 status = 'active'
6 AND region = 'US'
7)

If status is NULL, Query 1 might include a row while Query 2 excludes it. The != comparison with NULL yields UNKNOWN, changing how rows are filtered.

//

The Excluded Middle

In classical logic, A OR NOT A is always TRUE. In SQL, this fails when A involves NULL:

1SELECT
2 employee_id,
3 status,
4 CASE
5 WHEN status = 'active'
6 OR status != 'active' THEN 'One must be true'
7 ELSE 'Neither is true'
8 END AS logic_check
9FROM employee_metrics
Result
employee_idstatuslogic_check
E-001activeOne must be true
E-002inactiveOne must be true
E-003NULLNeither is true

For E-003: status = 'active' is UNKNOWN, and status != 'active' is also UNKNOWN. UNKNOWN OR UNKNOWN = UNKNOWN, so the ELSE branch executes.

Do
  • Always specify NULLS FIRST or NULLS LAST in window ORDER BY
  • Use COALESCE with LAG/LEAD for both edge cases and NULLs
  • Use NOT EXISTS instead of NOT IN when subqueries might have NULLs
  • Explicitly handle NULL in CASE with IS NULL checks
Don't
  • Don't use NOT IN with subqueries that might return NULL
  • Don't write CASE column WHEN NULL (it never matches)
  • Don't assume De Morgan's laws work with nullable columns
  • Don't forget that window aggregates ignore NULLs
The NULL DilemmaStep 1
>

The claims pipeline feeds three downstream systems: a financial ledger that sums settlement amounts, a regulatory report that counts claims by diagnosis, and a workload dashboard that assigns claims to adjusters. Each system breaks differently when NULLs propagate through.

claims
claim_iddiagnosis_codeadjuster_idsettlement_amtstatus
CLM-001J18.9ADJ-124500.00settled
CLM-002NULLADJ-121200.00pending
CLM-003K21.0NULLNULLopen
diagnosis_codes
codedescriptioncategory
J18.9Pneumoniarespiratory
K21.0GERDdigestive
May 2026
NULL Handling Strategy

The financial ledger uses SUM(settlement_amt) to calculate total payouts. With NULLs in the column, SUM silently skips those rows. The reported total is $2.1M lower than the actual liability. The CFO wants accurate numbers. How do you handle NULLs in the settlement pipeline?

SQL
SELECT SUM(settlement_amt) AS total
FROM claims
WHERE status = 'settled'
-- Returns: $18.4M
-- Actual liability: $20.5M
-- Missing: $2.1M in NULL rows

The NOT IN trap is one of the most dangerous pitfalls with NULL because it silently returns zero rows rather than throwing an error.

Three-valued logic means you can never substitute != NULL for IS NOT NULL: the comparison always returns UNKNOWN, excluding the row from any WHERE filter.

Developing a habit of reviewing whether subqueries in NOT IN can return NULL before using them in production is one of the most valuable defensive practices.

> Complete the filter to check for non-NULL status before filtering on profit.

SELECT
  *
FROM orders
WHERE ___ ___
___ profit > 0
IS NOT NULL
!= NULL
OR
AND
status

The NOT IN trap is one of the most dangerous NULL pitfalls in SQL because it silently returns zero rows rather than throwing an error, making it extremely difficult to debug without specifically knowing to look for NULLs in the subquery.

Three-valued logic is the reason why you can never substitute != NULL for IS NOT NULL: the comparison always returns UNKNOWN, which means the row will be excluded from any WHERE filter regardless of the actual value.

Developing a habit of reviewing whether subqueries in NOT IN or comparison conditions can return NULL before using them in production is one of the most valuable defensive practices for writing reliable SQL.

PUTTING IT ALL TOGETHER

> You are a data engineer at Experian building a customer-360 view by merging incomplete records from three source systems, each using different NULL conventions including NULLS FIRST ordering, DISTINCT deduplication, and NOT IN subquery filters.

NULLS FIRST and NULLS LAST control where missing identifiers land in window function rankings across sources.
UNION deduplicates NULL records so the same unknown customer does not appear multiple times in the merged view.
DISTINCT and NULL grouping means all unknown customer IDs collapse into one representative row per source.
Three-valued logic explains why NOT IN subqueries silently return zero rows when the exclusion list contains NULL.
KEY TAKEAWAYS
Use NULLS FIRST or NULLS LAST explicitly for consistent sort behavior across databases
In PARTITION BY, all NULLs group into a single partition (they're treated as equal)
DISTINCT treats all NULLs as equal; only one NULL appears in results
UNION eliminates duplicate NULLs; UNION ALL preserves them
Three-valued logic: NULL comparisons return UNKNOWN, not FALSE
NOT IN with NULLs can return zero rows: WHERE x NOT IN (1, NULL) filters out everything
NULL propagates: any arithmetic or string operation with NULL returns NULL
CASE WHEN col = NULL never matches; use CASE WHEN col IS NULL instead

TRUE, FALSE, and NULL walk into a bar

Category
SQL
Difficulty
advanced
Duration
26 minutes
Challenges
0 hands-on challenges

Topics covered: NULLs in window functions, NULLs in UNION operations, DISTINCT and NULL, NULL propagation, Three-valued logic (NOT IN)

Lesson Sections

  1. NULLs in window functions

    Sorting and Ordering NULLS FIRST/LAST NULL in PARTITION BY Calculations and Navigation Window Aggregates + NULL LAG/LEAD with NULL

  2. NULLs in UNION operations

    UNION Behavior UNION Deduplication UNION ALL Duplicates

  3. DISTINCT and NULL (concepts: sqlDistinct)

    DISTINCT Grouping Behavior NULL Collapsing Context Comparison NULL Behavior by Context

  4. NULL propagation

    Basic Propagation Rules Arithmetic Propagation String Concatenation Control Flow Propagation CASE Propagation Function Propagation

  5. Three-valued logic (NOT IN)

    SQL uses three-valued logic (TRUE, FALSE, UNKNOWN) instead of classical boolean logic. This fundamentally changes how compound conditions behave and breaks some intuitive rules. Three-Valued Logic Boolean operations in SQL don't just have TRUE and FALSE outcomes; UNKNOWN is a third possibility that propagates through logical operators. Truth Tables with UNKNOWN Common Pitfalls Three-valued logic creates counterintuitive behaviors that cause subtle bugs even for experienced developers. The NOT IN