NULL Values: Intermediate

Stripe processes hundreds of billions of dollars in payments each year, and every transaction can carry dozens of optional metadata fields that merchants may or may not populate. When Stripe engineers query payment data for reporting, they rely heavily on COALESCE to substitute safe defaults for missing fields so that revenue dashboards never crash or display NULL where a number is expected. Without COALESCE, a single merchant who skips an optional field could silently corrupt an aggregate that summarizes millions of transactions. This lesson covers COALESCE, NULLIF, and the NULL behavior in JOINs that every data engineer at a payments company must master.

Sentinel values and fake NULLs

Daily Life

Interviews

Spot disguised missing data in real tables

In an ideal world, NULL would always represent missing or unknown data. In reality, you'll encounter "fake NULLs": empty strings, zeros, and special sentinel values that systems use to mean "unknown" because they couldn't store a real NULL.

Sentinel Values

Many software systems are forced to produce values even when the data isn't known. This happens for several reasons:

SYSTEM LIMITATIONS

Legacy databases that don't support NULL in certain columns
APIs that require all fields to have values (no optional fields)
Form validation that rejects empty submissions
ETL pipelines that crash on NULL values
Programming languages that don't distinguish NULL from empty

When a system must produce a value but doesn't know the real answer, developers choose "placeholder" values. These sentinel values become embedded in the data and persist for years.

Common Sentinel Values

Legacy systems often use placeholder values instead of proper NULLs to represent missing data.

Common Sentinel Values

'' (empty string): VARCHAR placeholder for unknown / not provided
0: INTEGER placeholder for unknown / not applicable
-1: INTEGER placeholder for not set / invalid
-999: NUMERIC placeholder for missing measurement
9999-12-31: DATE placeholder for no end date / forever
1900-01-01: DATE placeholder for unknown date
N/A: VARCHAR placeholder for not applicable
UNKNOWN: VARCHAR placeholder for data not captured
TBD: VARCHAR placeholder for to be determined

These placeholders may seem harmless, but they complicate every query that touches the column.

Practical Implications

Consider why these patterns emerge in actual systems:

CRM System

Form requires a value, so 0 is entered for unknown company size, skewing averages.

E-commerce Platform

Import script sets empty string instead of NULL, causing blank product descriptions.

HR Database

Active employees get termination_date 9999-12-31, breaking tenure calculations.

Empty String vs NULL

Empty string ('') and NULL are fundamentally different, but many systems conflate them:

Choosing Empty String vs NULL

User left field blank → NULL
User explicitly typed nothing → '' (empty)
Field not asked on form → NULL
User cleared previous value → '' or NULL
System couldn't capture value → NULL
Value is intentionally blank → ''

In practice, most systems don't track this distinction. Form submissions often send '' for unanswered questions, and databases store it as-is. The meaning of "unknown" gets lost.

Detection and Handling

When analyzing data, you must understand what placeholder values exist and handle them appropriately:

	SELECT
	COUNT(*)
	FROM users
	WHERE phone IS NULL
	OR phone = ''

Result

count
8,934

The difference can be dramatic. In this example, checking only IS NULL captures 1,247 missing phone numbers, but the real count of "unknown" phones is 8,934 when including empty strings.

TIP

Before writing analytical queries, always explore your data: SELECT DISTINCT column_name FROM table to see what placeholder values exist. Document them for your team.

Normalize Early

The best approach is to convert sentinel values to proper NULLs at query time using NULLIF (covered later in this lesson). This normalizes your data before analysis:

	SELECT
	user_id,
	NULLIF(phone, '') AS phone,
	NULLIF(employee_count, 0) AS employee_count,
	NULLIF(
	termination_date,
	'9999-12-31'
	) AS termination_date
	FROM raw_data

Result

user_id	phone	employee_count	termination_date
U001	555-1234	50	NULL
U002	NULL	NULL	2024-06-30
U003	555-5678	25	NULL

This pattern converts sentinel values to NULL so that aggregate functions, JOINs, and filters all behave correctly. The rest of this lesson teaches you the tools to do this effectively.

Which approach gives you an accurate count of users who actually provided a phone number?

This query counts all non-NULL phones, but empty strings pass the IS NOT NULL check. Users who submitted a blank form field still get counted as "has phone," inflating the total.

	SELECT
	COUNT(phone) AS has_phone
	FROM users
	WHERE phone IS NOT NULL

Practice using NULLIF to convert empty strings into proper NULL values.

> Complete this query to replace empty email strings with a value the database treats as missing.

SELECT
  user_id,
  ___(___, ___) AS clean_email
FROM users

NULLIF

COALESCE

NULL

Sentinel values are a form of technical debt that accumulates when systems are designed without proper NULL support, and cleaning them at query time using NULLIF is the standard workaround until the data can be fixed at the source.

Documenting the sentinel values that exist in each column is one of the most valuable contributions a data analyst can make to a team, because other engineers will repeatedly rediscover and misinterpret those values if no record exists.

The habit of running SELECT DISTINCT on every optional column before writing analytical queries is one of the fastest ways to surface hidden sentinel values that would otherwise corrupt aggregations.

NULLs in INNER and LEFT JOINs

Daily Life

Interviews

Keep rows safe when joins hit missing keys

NULL values create specific behaviors in JOIN operations that can lead to unexpected results if you don't understand the rules. The key insight: NULL never equals anything, not even another NULL, so NULLs never match in JOIN conditions.

Understanding NULL in JOINs

Consider these two tables for our JOIN examples:

orders

order_id	customer_id
ORD-1	C-101
ORD-2	C-102
ORD-3	NULL
ORD-4	C-103

customers

customer_id	name
C-101	Alice
C-102	Bob
NULL	Guest
C-104	Diana

JOIN Type Behaviors

With INNER JOIN, rows with NULL join keys never match anything, so they're excluded from results on both sides:

	SELECT
	o.order_id,
	o.customer_id,
	c.name
	FROM orders AS o
	INNER JOIN customers AS c
	ON o.customer_id = c.customer_id

Result

order_id	customer_id	name
ORD-1	C-101	Alice
ORD-2	C-102	Bob

Notice: ORD-3 (with NULL customer_id) is missing. The "Guest" customer (with NULL customer_id) is also missing. C-103 and C-104 don't appear because they have no matching orders.

See how the join evaluates each key pair, including the rows where one or both sides contain NULL.

	SELECT
	o.order_id,
	o.customer_id,
	c.name
	FROM orders AS o
	INNER JOIN customers AS c
	ON o.customer_id = c.customer_id

Join Animation[object Object]

orders

order_id	customer_id
ORD-1	C-101
ORD-2	C-102
ORD-3	NULL
ORD-4	C-103

customers

customer_id	name
C-101	Alice
C-102	Bob
NULL	Guest
C-104	Diana

result

col_0	col_1	col_2
ORD-1	C-101	C-101	Alice

Row 1/4

TIP

The condition o.customer_id = c.customer_id evaluates to unknown when either side is NULL. INNER JOIN only includes rows where the condition is definitively true.

LEFT JOIN and NULL

LEFT JOIN preserves all rows from the left table, but NULL join keys still don't match:

	SELECT
	o.order_id,
	o.customer_id,
	c.name
	FROM orders AS o
	LEFT JOIN customers AS c
	ON o.customer_id = c.customer_id

Result

order_id	customer_id	name
ORD-1	C-101	Alice
ORD-2	C-102	Bob
ORD-3	NULL	NULL
ORD-4	C-103	NULL

ORD-3 now appears, but with NULL for name because its NULL customer_id didn't match the NULL customer_id in customers. ORD-4 appears with NULL name because C-103 doesn't exist in customers.

Observe how the LEFT JOIN preserves all left-side rows, filling unmatched right-side columns with NULL.

	SELECT
	e.emp_id,
	e.dept_id,
	d.dept_name
	FROM employees AS e
	LEFT JOIN departments AS d
	ON e.dept_id = d.dept_id

Join Animation[object Object]

employees

emp_id	dept_id
E-101	D-10
E-102	D-20
E-103	NULL
E-104	D-10
E-105	D-99

departments

dept_id	dept_name
D-10	Sales
D-20	Engineering
NULL	Unassigned
D-30	Marketing

result

col_0	col_1	col_2
E-101	D-10	D-10	Sales

Row 1/5

Even though E-103 has a NULL dept_id and D-99 does not exist, every employee row survives in the output. This preservation is what makes LEFT JOIN essential for detecting incomplete data.

Advanced JOIN Patterns

If you need NULLs to match NULLs, you must handle it explicitly:

	SELECT
	o.order_id,
	o.customer_id,
	c.name
	FROM orders AS o
	LEFT JOIN customers AS c
	ON o.customer_id = c.customer_id
	OR (
	o.customer_id IS NULL
	AND c.customer_id IS NULL
	)

Result

order_id	customer_id	name
ORD-1	C-101	Alice
ORD-2	C-102	Bob
ORD-3	NULL	Guest
ORD-4	C-103	NULL

Now ORD-3 matches the "Guest" customer because we explicitly check for both sides being NULL. This pattern is useful when NULL represents a meaningful category like "guest" or "unknown."

How does the JOIN type affect rows with NULL keys?

INNER JOIN drops every row where the join key is NULL on either side. ORD-3 (NULL customer_id) vanishes completely, and you silently lose data without any warning.

	SELECT
	o.order_id,
	c.name
	FROM orders AS o
	INNER JOIN customers AS c
	ON o.customer_id = c.customer_id

Try writing a LEFT JOIN that also matches rows where both join keys are NULL.

> Complete this query to include rows where both join keys might be missing.

SELECT
  *
FROM orders AS o
LEFT JOIN customers AS c
  ON o.cust_id = c.cust_id
  ___ (
  o.cust_id ___
  ___ c.cust_id ___
)

IS NULL

AND

= NULL

IS NOT NULL

The rule that NULL join keys never match is one of the most consequential behaviors in SQL because it causes silent row loss without any error or warning, making it one of the hardest classes of bug to detect in production.

When you discover that an INNER JOIN is returning fewer rows than expected, checking whether join key columns contain NULLs is the first diagnostic step and often the cause of the discrepancy.

The explicit OR (a IS NULL AND b IS NULL) pattern for matching NULL keys is a deliberate design choice, not a workaround, because it forces you to consciously decide whether two absent values represent the same entity in your domain.

COALESCE for defaults

Daily Life

Interviews

Replace missing values with useful defaults

COALESCE returns the first non-NULL value from a list of arguments. It's essential for providing default values when data might be missing and for making query results more readable.

Understanding COALESCE

COALESCE returns the first non-NULL value from a list of arguments, providing a clean way to handle missing data.

Basic Syntax

COALESCE evaluates arguments left to right and returns the first non-NULL value it finds.

	SELECT
	COALESCE(NULL, NULL, 'default') AS result

Result

result
default

Simple Example

A common use case is selecting the best available name for display purposes.

	SELECT
	customer_id,
	COALESCE(
	nickname,
	first_name,
	'Unknown'
	) AS display_name
	FROM customers

Result

customer_id	display_name
C-101	Al
C-102	Bob
C-103	Unknown

For each customer, this returns their nickname if they have one, otherwise their first_name, otherwise 'Unknown'. It cascades through options until finding a non-NULL value.

Watch how COALESCE walks through columns left to right, returning the first non-NULL value it finds for each row.

	SELECT
	id,
	COALESCE(phone, email, 'Unknown') AS contact
	FROM contacts

Coalesce Animation

contacts

id	phone	email
C-1	555-1234	NULL
C-2	NULL	bob@mail.com
C-3	NULL	NULL
C-4	555-9999	dan@mail.com

result

col_0	col_1
C-1	555-1234

Row 1/4

When both phone and email are NULL (C-3), COALESCE falls through to the fallback string "Unknown". This cascading logic ensures every row gets a usable value.

Practical Applications

A common use case is providing default values for calculations:

Without COALESCE, the calculation price - discount would return NULL whenever discount is NULL. COALESCE(discount, 0) treats missing discounts as zero, ensuring the math works.

COALESCE in Aggregations

Remember that SUM of all NULLs returns NULL, not zero. COALESCE can fix this:

	SELECT
	category,
	COALESCE(SUM(quantity), 0) AS total_quantity
	FROM inventory
	GROUP BY category

Result

category	total_quantity
Electronics	150
Clothing	320
Books	0

If a category has no inventory records (or all NULL quantities), the result shows 0 instead of NULL, making downstream calculations and reports cleaner.

COALESCE with JOINs

COALESCE is particularly useful for handling NULL values that appear after LEFT JOINs:

	SELECT
	o.order_id,
	COALESCE(c.name, 'Guest Order') AS customer_name
	FROM orders AS o
	LEFT JOIN customers AS c
	ON o.customer_id = c.customer_id

Result

order_id	customer_name
ORD-1	Alice Chen
ORD-2	Bob Smith
ORD-3	Guest Order
ORD-4	Guest Order

Orders without matching customers now show "Guest Order" instead of NULL, making reports more readable.

See how LEFT JOIN and COALESCE work together: unmatched orders receive "Guest Order" instead of a raw NULL in the display name.

	SELECT
	o.order_id,
	COALESCE(c.name, 'Guest Order') AS customer_name
	FROM orders AS o
	LEFT JOIN customers AS c
	ON o.customer_id = c.customer_id

For matched customers, COALESCE picks the nickname when available and falls back to the full name. For unmatched or NULL-keyed orders, the final fallback "Guest Order" ensures every row has a readable label.

> Complete this query to calculate the final price, treating missing discounts as zero.

SELECT
  product_id,
  price - ___(___, ___) AS final_price
FROM products

COALESCE

discount

NULLIF

NULL

COALESCE is the single most frequently used NULL-handling function in production SQL because providing a sensible default is almost always preferable to propagating NULL into reports and dashboards.

The cascading nature of COALESCE makes it ideal for priority hierarchies like "use the customer-provided name, then the system name, then a generic label" without needing nested CASE expressions.

Wrapping aggregate functions with COALESCE to substitute zero or an empty string is a standard pattern for making reports resilient to groups that have no data, preventing NULL from appearing in summary tables.

NULLIF for sentinel values

Daily Life

Interviews

Convert fake values back to proper NULLs

NULLIF does the opposite of COALESCE: it returns NULL when two values are equal. This is useful for converting sentinel values (like 0, -1, or empty strings) into proper NULLs.

Understanding NULLIF

NULLIF returns NULL if two values are equal, otherwise it returns the first value. It is the inverse of COALESCE.

Basic Syntax

NULLIF compares two expressions and returns NULL when they match.

	SELECT
	NULLIF(5, 5) AS returns_null,
	NULLIF(5, 3) AS returns_five

Result

returns_null	returns_five
NULL	5

Sentinel Conversion

Legacy systems often use special values like 0, -1, or empty strings to mean "unknown." NULLIF converts these to proper NULLs:

	SELECT
	user_id,
	age,
	NULLIF(age, 0) AS clean_age
	FROM legacy_users

Result

user_id	age	clean_age
U-101	28	28
U-102	0	NULL
U-103	35	35
U-104	0	NULL

The legacy system used 0 to mean "age unknown." NULLIF(age, 0) converts these zeros to NULL so they're handled correctly by aggregate functions.

Advanced NULLIF Patterns

A powerful pattern: use NULLIF to prevent division-by-zero errors:

	SELECT
	product_id,
	total_revenue,
	total_orders,
	total_revenue / NULLIF(
	total_orders,
	0
	) AS avg_order_value
	FROM products

Result

product_id	total_revenue	total_orders	avg_order_value
P-101	5000	50	100.00
P-102	0	0	NULL
P-103	3200	40	80.00

NULLIF(total_orders, 0) returns NULL when total_orders is zero. Dividing by NULL returns NULL (instead of an error), gracefully handling products with no orders.

TIP

The NULLIF-for-division pattern is one of the most useful SQL tricks. Memorize it: value / NULLIF(divisor, 0) safely handles zero divisors.

Cleaning Empty Strings

Empty strings often represent missing data. NULLIF converts them to proper NULLs.

	SELECT
	user_id,
	NULLIF(phone, '') AS phone,
	NULLIF(email, '') AS email
	FROM customers

Result

user_id	phone	email
U-101	555-0101	alice@email.com
U-102	NULL	bob@email.com
U-103	555-0303	NULL

This converts empty strings to NULL, ensuring consistent handling of missing contact info.

> Complete this query to safely divide the total amount by quantity, avoiding division-by-zero errors.

SELECT
  product_id,
  total_amount / ___(___, ___) AS avg_value
FROM transactions

NULLIF

quantity

COALESCE

NULL

The NULLIF division-by-zero pattern is one of those techniques that becomes second nature once you learn it because it elegantly converts a potential runtime error into a predictable NULL result that downstream code can handle.

NULLIF is most powerful when used at the beginning of a data pipeline to normalize sentinel values into proper NULLs, because every downstream query benefits from not having to repeat the sentinel-detection logic.

Combining NULLIF to clean sentinel values with COALESCE to supply defaults is the complete pattern for robust NULL handling, covering both the case where a value is absent and the case where it has been stored as a placeholder.

COALESCE and NULLIF patterns

Daily Life

Interviews

Chain NULL handlers for clean output

COALESCE and NULLIF work together powerfully. A common pattern is to clean data with NULLIF, then provide defaults with COALESCE:

Combining Functions

Nesting NULLIF inside COALESCE creates a clean and default pattern that handles both empty strings and NULLs.

	SELECT
	product_id,
	COALESCE(
	NULLIF(display_name, ''),
	NULLIF(product_name, ''),
	'Unknown Product'
	) AS name
	FROM products

Result

product_id	name
P-101	Widget Pro
P-102	Gadget Plus
P-103	Unknown Product

This first converts empty strings to NULL (via NULLIF), then cascades through options (via COALESCE) to find a usable name.

✓Do

Use COALESCE to provide meaningful defaults for NULL values
Use NULLIF to convert sentinel values (0, -1, '') to proper NULLs
Combine NULLIF and COALESCE to clean and default in one expression
Remember that NULL join keys never match; handle explicitly if needed

✗Don't

Don't assume 0 or empty string means NULL; data sources vary
Don't forget that COALESCE returns NULL if ALL arguments are NULL
Don't use COALESCE to hide data quality issues; fix them at the source

> Complete this query to display a fallback name when the display_name is empty or NULL.

SELECT
  product_id,
  ___(
    ___(display_name, ''),
    'Unknown'
    ) AS name
FROM products

COALESCE

NULLIF

IFNULL

NVL

The COALESCE + NULLIF pattern is the standard way to handle both NULL and empty-string sentinel values in a single expression.

NULLIF is also useful for preventing division by zero: dividing by NULLIF(denominator, 0) returns NULL instead of crashing.

COALESCE can accept any number of arguments and returns the first non-NULL one, making it flexible for multi-level fallback chains.

❯❯❯PUTTING IT ALL TOGETHER

> You are a reporting analyst at Recurly building a monthly revenue dashboard where failed API syncs leave payment amounts as empty strings or sentinel -1 values rather than NULL. Finance needs an accurate MRR figure by plan tier before the board meeting.

NULLIF converts sentinel -1 amounts and empty strings into proper NULLs before any aggregation runs.

NULLs in LEFT JOINs mean plan tiers with no successful payments appear with NULL revenue, not zero.

COALESCE substitutes zero for NULL revenue so tier totals display correctly in the dashboard table.

Combining NULLIF and COALESCE in one expression cleans and defaults each payment column simultaneously.

KEY TAKEAWAYS

Fake NULLs (empty strings, 0, -1) exist because legacy systems couldn't store real NULLs

Comparing NULL to NULL doesn't yield true, so JOINs on NULL keys never match

LEFT JOIN produces NULLs for unmatched rows; use COALESCE to provide defaults

COALESCE(a, b, c) returns the first non-NULL value in the list

NULLIF(a, b) returns NULL if a = b; perfect for cleaning sentinel values

Use NULLIF to prevent division by zero: x / NULLIF(y, 0)

Pattern: COALESCE(NULLIF(column, ''), 'default') cleans empty strings

Always check for fake NULLs when working with data from external sources

NULL Values: Intermediate

Sentinel values and fake NULLs

Sentinel Values

Common Sentinel Values

Practical Implications

Empty String vs NULL

Detection and Handling

Normalize Early

NULLs in INNER and LEFT JOINs

Understanding NULL in JOINs

JOIN Type Behaviors

LEFT JOIN and NULL

Advanced JOIN Patterns

COALESCE for defaults

Understanding COALESCE

Basic Syntax

Simple Example

Practical Applications

COALESCE in Aggregations

COALESCE with JOINs

NULLIF for sentinel values

Understanding NULLIF

Basic Syntax

Sentinel Conversion

Advanced NULLIF Patterns

Cleaning Empty Strings

COALESCE and NULLIF patterns

Combining Functions

Lesson Sections