NULL Values: Intermediate

Stripe processes hundreds of billions of dollars in payments each year, and every transaction can carry dozens of optional metadata fields that merchants may or may not populate. When Stripe engineers query payment data for reporting, they rely heavily on COALESCE to substitute safe defaults for missing fields so that revenue dashboards never crash or display NULL where a number is expected. Without COALESCE, a single merchant who skips an optional field could silently corrupt an aggregate that summarizes millions of transactions. This lesson covers COALESCE, NULLIF, and the NULL behavior in JOINs that every data engineer at a payments company must master.

Sentinel values and fake NULLs

Daily Life
Interviews

Spot disguised missing data in real tables

In an ideal world, NULL would always represent missing or unknown data. In reality, you'll encounter "fake NULLs": empty strings, zeros, and special sentinel values that systems use to mean "unknown" because they couldn't store a real NULL.

Sentinel Values

Many software systems are forced to produce values even when the data isn't known. This happens for several reasons:
SYSTEM LIMITATIONS
  • Legacy databases that don't support NULL in certain columns
  • APIs that require all fields to have values (no optional fields)
  • Form validation that rejects empty submissions
  • ETL pipelines that crash on NULL values
  • Programming languages that don't distinguish NULL from empty
When a system must produce a value but doesn't know the real answer, developers choose "placeholder" values. These sentinel values become embedded in the data and persist for years.
//

Common Sentinel Values

Legacy systems often use placeholder values instead of proper NULLs to represent missing data.

Common Sentinel Values
  • '' (empty string): VARCHAR placeholder for unknown / not provided
  • 0: INTEGER placeholder for unknown / not applicable
  • -1: INTEGER placeholder for not set / invalid
  • -999: NUMERIC placeholder for missing measurement
  • 9999-12-31: DATE placeholder for no end date / forever
  • 1900-01-01: DATE placeholder for unknown date
  • N/A: VARCHAR placeholder for not applicable
  • UNKNOWN: VARCHAR placeholder for data not captured
  • TBD: VARCHAR placeholder for to be determined
These placeholders may seem harmless, but they complicate every query that touches the column.

Practical Implications

Consider why these patterns emerge in actual systems:
CRM System
CRM System
Form requires a value, so 0 is entered for unknown company size, skewing averages.
E-commerce Platform
E-commerce Platform
Import script sets empty string instead of NULL, causing blank product descriptions.
HR Database
HR Database
Active employees get termination_date 9999-12-31, breaking tenure calculations.
//

Empty String vs NULL

Empty string ('') and NULL are fundamentally different, but many systems conflate them:

Choosing Empty String vs NULL
  • User left field blank → NULL
  • User explicitly typed nothing → '' (empty)
  • Field not asked on form → NULL
  • User cleared previous value → '' or NULL
  • System couldn't capture value → NULL
  • Value is intentionally blank → ''
In practice, most systems don't track this distinction. Form submissions often send '' for unanswered questions, and databases store it as-is. The meaning of "unknown" gets lost.

Detection and Handling

When analyzing data, you must understand what placeholder values exist and handle them appropriately:
1SELECT
2 COUNT(*)
3FROM users
4WHERE phone IS NULL
5OR phone = ''
Result
count
8,934

The difference can be dramatic. In this example, checking only IS NULL captures 1,247 missing phone numbers, but the real count of "unknown" phones is 8,934 when including empty strings.

TIP
Before writing analytical queries, always explore your data: SELECT DISTINCT column_name FROM table to see what placeholder values exist. Document them for your team.
//

Normalize Early

The best approach is to convert sentinel values to proper NULLs at query time using NULLIF (covered later in this lesson). This normalizes your data before analysis:

1SELECT
2 user_id,
3 NULLIF(phone, '') AS phone,
4 NULLIF(employee_count, 0) AS employee_count,
5 NULLIF(
6 termination_date,
7 '9999-12-31'
8 ) AS termination_date
9FROM raw_data
Result
user_idphoneemployee_counttermination_date
U001555-123450NULL
U002NULLNULL2024-06-30
U003555-567825NULL

This pattern converts sentinel values to NULL so that aggregate functions, JOINs, and filters all behave correctly. The rest of this lesson teaches you the tools to do this effectively.

Which approach gives you an accurate count of users who actually provided a phone number?
This query counts all non-NULL phones, but empty strings pass the IS NOT NULL check. Users who submitted a blank form field still get counted as "has phone," inflating the total.
1SELECT
2 COUNT(phone) AS has_phone
3FROM users
4WHERE phone IS NOT NULL

Practice using NULLIF to convert empty strings into proper NULL values.

> Complete this query to replace empty email strings with a value the database treats as missing.

SELECT
  user_id,
  ___(___, ___) AS clean_email
FROM users
email
NULLIF
COALESCE
''
NULL

Sentinel values are a form of technical debt that accumulates when systems are designed without proper NULL support, and cleaning them at query time using NULLIF is the standard workaround until the data can be fixed at the source.

Documenting the sentinel values that exist in each column is one of the most valuable contributions a data analyst can make to a team, because other engineers will repeatedly rediscover and misinterpret those values if no record exists.

The habit of running SELECT DISTINCT on every optional column before writing analytical queries is one of the fastest ways to surface hidden sentinel values that would otherwise corrupt aggregations.

NULLs in INNER and LEFT JOINs

Daily Life
Interviews

Keep rows safe when joins hit missing keys

NULL values create specific behaviors in JOIN operations that can lead to unexpected results if you don't understand the rules. The key insight: NULL never equals anything, not even another NULL, so NULLs never match in JOIN conditions.

Understanding NULL in JOINs

Consider these two tables for our JOIN examples:

orders
order_idcustomer_id
ORD-1C-101
ORD-2C-102
ORD-3NULL
ORD-4C-103
customers
customer_idname
C-101Alice
C-102Bob
NULLGuest
C-104Diana

JOIN Type Behaviors

With INNER JOIN, rows with NULL join keys never match anything, so they're excluded from results on both sides:

1SELECT
2 o.order_id,
3 o.customer_id,
4 c.name
5FROM orders AS o
6INNER JOIN customers AS c
7 ON o.customer_id = c.customer_id
Result
order_idcustomer_idname
ORD-1C-101Alice
ORD-2C-102Bob

Notice: ORD-3 (with NULL customer_id) is missing. The "Guest" customer (with NULL customer_id) is also missing. C-103 and C-104 don't appear because they have no matching orders.

See how the join evaluates each key pair, including the rows where one or both sides contain NULL.

1SELECT
2 o.order_id,
3 o.customer_id,
4 c.name
5FROM orders AS o
6INNER JOIN customers AS c
7 ON o.customer_id = c.customer_id
Join Animation[object Object]
orders
order_idcustomer_id
ORD-1C-101
ORD-2C-102
ORD-3NULL
ORD-4C-103
customers
customer_idname
C-101Alice
C-102Bob
NULLGuest
C-104Diana
result
col_0col_1col_2
ORD-1C-101C-101Alice
Row 1/4
TIP
The condition o.customer_id = c.customer_id evaluates to unknown when either side is NULL. INNER JOIN only includes rows where the condition is definitively true.
//

LEFT JOIN and NULL

LEFT JOIN preserves all rows from the left table, but NULL join keys still don't match:

1SELECT
2 o.order_id,
3 o.customer_id,
4 c.name
5FROM orders AS o
6LEFT JOIN customers AS c
7 ON o.customer_id = c.customer_id
Result
order_idcustomer_idname
ORD-1C-101Alice
ORD-2C-102Bob
ORD-3NULLNULL
ORD-4C-103NULL

ORD-3 now appears, but with NULL for name because its NULL customer_id didn't match the NULL customer_id in customers. ORD-4 appears with NULL name because C-103 doesn't exist in customers.

Observe how the LEFT JOIN preserves all left-side rows, filling unmatched right-side columns with NULL.

1SELECT
2 e.emp_id,
3 e.dept_id,
4 d.dept_name
5FROM employees AS e
6LEFT JOIN departments AS d
7 ON e.dept_id = d.dept_id
Join Animation[object Object]
employees
emp_iddept_id
E-101D-10
E-102D-20
E-103NULL
E-104D-10
E-105D-99
departments
dept_iddept_name
D-10Sales
D-20Engineering
NULLUnassigned
D-30Marketing
result
col_0col_1col_2
E-101D-10D-10Sales
Row 1/5

Even though E-103 has a NULL dept_id and D-99 does not exist, every employee row survives in the output. This preservation is what makes LEFT JOIN essential for detecting incomplete data.

Advanced JOIN Patterns

If you need NULLs to match NULLs, you must handle it explicitly:

1SELECT
2 o.order_id,
3 o.customer_id,
4 c.name
5FROM orders AS o
6LEFT JOIN customers AS c
7 ON o.customer_id = c.customer_id
8 OR (
9 o.customer_id IS NULL
10 AND c.customer_id IS NULL
11)
Result
order_idcustomer_idname
ORD-1C-101Alice
ORD-2C-102Bob
ORD-3NULLGuest
ORD-4C-103NULL

Now ORD-3 matches the "Guest" customer because we explicitly check for both sides being NULL. This pattern is useful when NULL represents a meaningful category like "guest" or "unknown."

How does the JOIN type affect rows with NULL keys?

INNER JOIN drops every row where the join key is NULL on either side. ORD-3 (NULL customer_id) vanishes completely, and you silently lose data without any warning.
1SELECT
2 o.order_id,
3 c.name
4FROM orders AS o
5INNER JOIN customers AS c
6 ON o.customer_id = c.customer_id

Try writing a LEFT JOIN that also matches rows where both join keys are NULL.

> Complete this query to include rows where both join keys might be missing.

SELECT
  *
FROM orders AS o
LEFT JOIN customers AS c
  ON o.cust_id = c.cust_id
  ___ (
  o.cust_id ___
  ___ c.cust_id ___
)
= NULL
IS NULL
OR
IS NOT NULL
AND

The rule that NULL join keys never match is one of the most consequential behaviors in SQL because it causes silent row loss without any error or warning, making it one of the hardest classes of bug to detect in production.

When you discover that an INNER JOIN is returning fewer rows than expected, checking whether join key columns contain NULLs is the first diagnostic step and often the cause of the discrepancy.

The explicit OR (a IS NULL AND b IS NULL) pattern for matching NULL keys is a deliberate design choice, not a workaround, because it forces you to consciously decide whether two absent values represent the same entity in your domain.

COALESCE for defaults

Daily Life
Interviews

Replace missing values with useful defaults

COALESCE returns the first non-NULL value from a list of arguments. It's essential for providing default values when data might be missing and for making query results more readable.

Understanding COALESCE

COALESCE returns the first non-NULL value from a list of arguments, providing a clean way to handle missing data.

//

Basic Syntax

COALESCE evaluates arguments left to right and returns the first non-NULL value it finds.

1SELECT
2 COALESCE(NULL, NULL, 'default') AS result
Result
result
default
//

Simple Example

A common use case is selecting the best available name for display purposes.
1SELECT
2 customer_id,
3 COALESCE(
4 nickname,
5 first_name,
6 'Unknown'
7 ) AS display_name
8FROM customers
Result
customer_iddisplay_name
C-101Al
C-102Bob
C-103Unknown

For each customer, this returns their nickname if they have one, otherwise their first_name, otherwise 'Unknown'. It cascades through options until finding a non-NULL value.

Watch how COALESCE walks through columns left to right, returning the first non-NULL value it finds for each row.

1SELECT
2 id,
3 COALESCE(phone, email, 'Unknown') AS contact
4FROM contacts
Coalesce Animation
contacts
idphoneemail
C-1555-1234NULL
C-2NULLbob@mail.com
C-3NULLNULL
C-4555-9999dan@mail.com
result
col_0col_1
C-1555-1234
Row 1/4

When both phone and email are NULL (C-3), COALESCE falls through to the fallback string "Unknown". This cascading logic ensures every row gets a usable value.

Practical Applications

A common use case is providing default values for calculations:

Without COALESCE, the calculation price - discount would return NULL whenever discount is NULL. COALESCE(discount, 0) treats missing discounts as zero, ensuring the math works.

//

COALESCE in Aggregations

Remember that SUM of all NULLs returns NULL, not zero. COALESCE can fix this:

1SELECT
2 category,
3 COALESCE(SUM(quantity), 0) AS total_quantity
4FROM inventory
5GROUP BY category
Result
categorytotal_quantity
Electronics150
Clothing320
Books0

If a category has no inventory records (or all NULL quantities), the result shows 0 instead of NULL, making downstream calculations and reports cleaner.

COALESCE with JOINs

COALESCE is particularly useful for handling NULL values that appear after LEFT JOINs:

1SELECT
2 o.order_id,
3 COALESCE(c.name, 'Guest Order') AS customer_name
4FROM orders AS o
5LEFT JOIN customers AS c
6 ON o.customer_id = c.customer_id
Result
order_idcustomer_name
ORD-1Alice Chen
ORD-2Bob Smith
ORD-3Guest Order
ORD-4Guest Order

Orders without matching customers now show "Guest Order" instead of NULL, making reports more readable.

See how LEFT JOIN and COALESCE work together: unmatched orders receive "Guest Order" instead of a raw NULL in the display name.

1SELECT
2 o.order_id,
3 COALESCE(c.name, 'Guest Order') AS customer_name
4FROM orders AS o
5LEFT JOIN customers AS c
6 ON o.customer_id = c.customer_id

For matched customers, COALESCE picks the nickname when available and falls back to the full name. For unmatched or NULL-keyed orders, the final fallback "Guest Order" ensures every row has a readable label.

> Complete this query to calculate the final price, treating missing discounts as zero.

SELECT
  product_id,
  price - ___(___, ___) AS final_price
FROM products
discount
NULL
COALESCE
NULLIF
0

COALESCE is the single most frequently used NULL-handling function in production SQL because providing a sensible default is almost always preferable to propagating NULL into reports and dashboards.

The cascading nature of COALESCE makes it ideal for priority hierarchies like "use the customer-provided name, then the system name, then a generic label" without needing nested CASE expressions.

Wrapping aggregate functions with COALESCE to substitute zero or an empty string is a standard pattern for making reports resilient to groups that have no data, preventing NULL from appearing in summary tables.

NULLIF for sentinel values

Daily Life
Interviews

Convert fake values back to proper NULLs

NULLIF does the opposite of COALESCE: it returns NULL when two values are equal. This is useful for converting sentinel values (like 0, -1, or empty strings) into proper NULLs.

Understanding NULLIF

NULLIF returns NULL if two values are equal, otherwise it returns the first value. It is the inverse of COALESCE.

//

Basic Syntax

NULLIF compares two expressions and returns NULL when they match.

1SELECT
2 NULLIF(5, 5) AS returns_null,
3 NULLIF(5, 3) AS returns_five
Result
returns_nullreturns_five
NULL5
//

Sentinel Conversion

Legacy systems often use special values like 0, -1, or empty strings to mean "unknown." NULLIF converts these to proper NULLs:

1SELECT
2 user_id,
3 age,
4 NULLIF(age, 0) AS clean_age
5FROM legacy_users
Result
user_idageclean_age
U-1012828
U-1020NULL
U-1033535
U-1040NULL

The legacy system used 0 to mean "age unknown." NULLIF(age, 0) converts these zeros to NULL so they're handled correctly by aggregate functions.

Advanced NULLIF Patterns

A powerful pattern: use NULLIF to prevent division-by-zero errors:

1SELECT
2 product_id,
3 total_revenue,
4 total_orders,
5 total_revenue / NULLIF(
6 total_orders,
7 0
8 ) AS avg_order_value
9FROM products
Result
product_idtotal_revenuetotal_ordersavg_order_value
P-101500050100.00
P-10200NULL
P-10332004080.00

NULLIF(total_orders, 0) returns NULL when total_orders is zero. Dividing by NULL returns NULL (instead of an error), gracefully handling products with no orders.

TIP
The NULLIF-for-division pattern is one of the most useful SQL tricks. Memorize it: value / NULLIF(divisor, 0) safely handles zero divisors.
//

Cleaning Empty Strings

Empty strings often represent missing data. NULLIF converts them to proper NULLs.

1SELECT
2 user_id,
3 NULLIF(phone, '') AS phone,
4 NULLIF(email, '') AS email
5FROM customers
Result
user_idphoneemail
U-101555-0101alice@email.com
U-102NULLbob@email.com
U-103555-0303NULL

This converts empty strings to NULL, ensuring consistent handling of missing contact info.

> Complete this query to safely divide the total amount by quantity, avoiding division-by-zero errors.

SELECT
  product_id,
  total_amount / ___(___, ___) AS avg_value
FROM transactions
NULLIF
0
quantity
COALESCE
NULL

The NULLIF division-by-zero pattern is one of those techniques that becomes second nature once you learn it because it elegantly converts a potential runtime error into a predictable NULL result that downstream code can handle.

NULLIF is most powerful when used at the beginning of a data pipeline to normalize sentinel values into proper NULLs, because every downstream query benefits from not having to repeat the sentinel-detection logic.

Combining NULLIF to clean sentinel values with COALESCE to supply defaults is the complete pattern for robust NULL handling, covering both the case where a value is absent and the case where it has been stored as a placeholder.

COALESCE and NULLIF patterns

Daily Life
Interviews

Chain NULL handlers for clean output

COALESCE and NULLIF work together powerfully. A common pattern is to clean data with NULLIF, then provide defaults with COALESCE:

Combining Functions

Nesting NULLIF inside COALESCE creates a clean and default pattern that handles both empty strings and NULLs.

1SELECT
2 product_id,
3 COALESCE(
4 NULLIF(display_name, ''),
5 NULLIF(product_name, ''),
6 'Unknown Product'
7 ) AS name
8FROM products
Result
product_idname
P-101Widget Pro
P-102Gadget Plus
P-103Unknown Product

This first converts empty strings to NULL (via NULLIF), then cascades through options (via COALESCE) to find a usable name.

Do
  • Use COALESCE to provide meaningful defaults for NULL values
  • Use NULLIF to convert sentinel values (0, -1, '') to proper NULLs
  • Combine NULLIF and COALESCE to clean and default in one expression
  • Remember that NULL join keys never match; handle explicitly if needed
Don't
  • Don't assume 0 or empty string means NULL; data sources vary
  • Don't forget that COALESCE returns NULL if ALL arguments are NULL
  • Don't use COALESCE to hide data quality issues; fix them at the source

> Complete this query to display a fallback name when the display_name is empty or NULL.

SELECT
  product_id,
  ___(
    ___(display_name, ''),
    'Unknown'
    ) AS name
FROM products
IFNULL
COALESCE
NULLIF
NVL

The COALESCE + NULLIF pattern is the standard way to handle both NULL and empty-string sentinel values in a single expression.

NULLIF is also useful for preventing division by zero: dividing by NULLIF(denominator, 0) returns NULL instead of crashing.

COALESCE can accept any number of arguments and returns the first non-NULL one, making it flexible for multi-level fallback chains.

PUTTING IT ALL TOGETHER

> You are a reporting analyst at Recurly building a monthly revenue dashboard where failed API syncs leave payment amounts as empty strings or sentinel -1 values rather than NULL. Finance needs an accurate MRR figure by plan tier before the board meeting.

NULLIF converts sentinel -1 amounts and empty strings into proper NULLs before any aggregation runs.
NULLs in LEFT JOINs mean plan tiers with no successful payments appear with NULL revenue, not zero.
COALESCE substitutes zero for NULL revenue so tier totals display correctly in the dashboard table.
Combining NULLIF and COALESCE in one expression cleans and defaults each payment column simultaneously.
KEY TAKEAWAYS
Fake NULLs (empty strings, 0, -1) exist because legacy systems couldn't store real NULLs
Comparing NULL to NULL doesn't yield true, so JOINs on NULL keys never match
LEFT JOIN produces NULLs for unmatched rows; use COALESCE to provide defaults
COALESCE(a, b, c) returns the first non-NULL value in the list
NULLIF(a, b) returns NULL if a = b; perfect for cleaning sentinel values
Use NULLIF to prevent division by zero: x / NULLIF(y, 0)
Pattern: COALESCE(NULLIF(column, ''), 'default') cleans empty strings
Always check for fake NULLs when working with data from external sources

COALESCE: NULL won

COALESCE: NULL won't fix itself

Category
SQL
Difficulty
intermediate
Duration
31 minutes
Challenges
0 hands-on challenges

Topics covered: Sentinel values and fake NULLs, NULLs in INNER and LEFT JOINs, COALESCE for defaults, NULLIF for sentinel values, COALESCE and NULLIF patterns

Lesson Sections

  1. Sentinel values and fake NULLs (concepts: sqlNullif)

    Sentinel Values Many software systems are forced to produce values even when the data isn't known. This happens for several reasons: When a system must produce a value but doesn't know the real answer, developers choose "placeholder" values. These sentinel values become embedded in the data and persist for years. Common Sentinel Values These placeholders may seem harmless, but they complicate every query that touches the column. Practical Implications Consider why these patterns emerge in actual

  2. NULLs in INNER and LEFT JOINs (concepts: sqlNullInJoins)

    Understanding NULL in JOINs JOIN Type Behaviors LEFT JOIN and NULL Advanced JOIN Patterns

  3. COALESCE for defaults (concepts: sqlCoalesce)

    Understanding COALESCE Basic Syntax Simple Example A common use case is selecting the best available name for display purposes. Practical Applications A common use case is providing default values for calculations: COALESCE in Aggregations COALESCE with JOINs

  4. NULLIF for sentinel values

    Understanding NULLIF Basic Syntax Sentinel Conversion Advanced NULLIF Patterns Cleaning Empty Strings

  5. COALESCE and NULLIF patterns

    Combining Functions