NULL Values: Intermediate
Stripe processes hundreds of billions of dollars in payments each year, and every transaction can carry dozens of optional metadata fields that merchants may or may not populate. When Stripe engineers query payment data for reporting, they rely heavily on COALESCE to substitute safe defaults for missing fields so that revenue dashboards never crash or display NULL where a number is expected. Without COALESCE, a single merchant who skips an optional field could silently corrupt an aggregate that summarizes millions of transactions. This lesson covers COALESCE, NULLIF, and the NULL behavior in JOINs that every data engineer at a payments company must master.
Sentinel values and fake NULLs
Spot disguised missing data in real tables
In an ideal world, NULL would always represent missing or unknown data. In reality, you'll encounter "fake NULLs": empty strings, zeros, and special sentinel values that systems use to mean "unknown" because they couldn't store a real NULL.
Sentinel Values
- Legacy databases that don't support
NULLin certain columns - APIs that require all fields to have values (no optional fields)
- Form validation that rejects empty submissions
- ETL pipelines that crash on
NULLvalues - Programming languages that don't distinguish
NULLfrom empty
Common Sentinel Values
Legacy systems often use placeholder values instead of proper NULLs to represent missing data.
- '' (empty string): VARCHAR placeholder for unknown / not provided
- 0: INTEGER placeholder for unknown / not applicable
- -1: INTEGER placeholder for not set / invalid
- -999: NUMERIC placeholder for missing measurement
- 9999-12-31: DATE placeholder for no end date / forever
- 1900-01-01: DATE placeholder for unknown date
- N/A: VARCHAR placeholder for not applicable
- UNKNOWN: VARCHAR placeholder for data not captured
- TBD: VARCHAR placeholder for to be determined
Practical Implications
Empty String vs NULL
Empty string ('') and NULL are fundamentally different, but many systems conflate them:
- User left field blank → NULL
- User explicitly typed nothing → '' (empty)
- Field not asked on form → NULL
- User cleared previous value → '' or NULL
- System couldn't capture value → NULL
- Value is intentionally blank → ''
Detection and Handling
The difference can be dramatic. In this example, checking only IS NULL captures 1,247 missing phone numbers, but the real count of "unknown" phones is 8,934 when including empty strings.
SELECT DISTINCT column_name FROM table to see what placeholder values exist. Document them for your team.Normalize Early
The best approach is to convert sentinel values to proper NULLs at query time using NULLIF (covered later in this lesson). This normalizes your data before analysis:
This pattern converts sentinel values to NULL so that aggregate functions, JOINs, and filters all behave correctly. The rest of this lesson teaches you the tools to do this effectively.
Practice using NULLIF to convert empty strings into proper NULL values.
> Complete this query to replace empty email strings with a value the database treats as missing.
SELECT user_id, (, ) AS clean_email FROM users
Sentinel values are a form of technical debt that accumulates when systems are designed without proper NULL support, and cleaning them at query time using NULLIF is the standard workaround until the data can be fixed at the source.
The habit of running SELECT DISTINCT on every optional column before writing analytical queries is one of the fastest ways to surface hidden sentinel values that would otherwise corrupt aggregations.
NULLs in INNER and LEFT JOINs
Keep rows safe when joins hit missing keys
NULL values create specific behaviors in JOIN operations that can lead to unexpected results if you don't understand the rules. The key insight: NULL never equals anything, not even another NULL, so NULLs never match in JOIN conditions.
Understanding NULL in JOINs
Consider these two tables for our JOIN examples:
| order_id | customer_id |
|---|---|
| ORD-1 | C-101 |
| ORD-2 | C-102 |
| ORD-3 | NULL |
| ORD-4 | C-103 |
| customer_id | name |
|---|---|
| C-101 | Alice |
| C-102 | Bob |
| NULL | Guest |
| C-104 | Diana |
JOIN Type Behaviors
With INNER JOIN, rows with NULL join keys never match anything, so they're excluded from results on both sides:
Notice: ORD-3 (with NULL customer_id) is missing. The "Guest" customer (with NULL customer_id) is also missing. C-103 and C-104 don't appear because they have no matching orders.
See how the join evaluates each key pair, including the rows where one or both sides contain NULL.
| order_id | customer_id |
|---|---|
| ORD-1 | C-101 |
| ORD-2 | C-102 |
| ORD-3 | NULL |
| ORD-4 | C-103 |
| customer_id | name |
|---|---|
| C-101 | Alice |
| C-102 | Bob |
| NULL | Guest |
| C-104 | Diana |
| col_0 | col_1 | col_2 | |
|---|---|---|---|
| ORD-1 | C-101 | C-101 | Alice |
o.customer_id = c.customer_id evaluates to unknown when either side is NULL. INNER JOIN only includes rows where the condition is definitively true.LEFT JOIN and NULL
LEFT JOIN preserves all rows from the left table, but NULL join keys still don't match:
ORD-3 now appears, but with NULL for name because its NULL customer_id didn't match the NULL customer_id in customers. ORD-4 appears with NULL name because C-103 doesn't exist in customers.
Observe how the LEFT JOIN preserves all left-side rows, filling unmatched right-side columns with NULL.
| emp_id | dept_id |
|---|---|
| E-101 | D-10 |
| E-102 | D-20 |
| E-103 | NULL |
| E-104 | D-10 |
| E-105 | D-99 |
| dept_id | dept_name |
|---|---|
| D-10 | Sales |
| D-20 | Engineering |
| NULL | Unassigned |
| D-30 | Marketing |
| col_0 | col_1 | col_2 | |
|---|---|---|---|
| E-101 | D-10 | D-10 | Sales |
Even though E-103 has a NULL dept_id and D-99 does not exist, every employee row survives in the output. This preservation is what makes LEFT JOIN essential for detecting incomplete data.
Advanced JOIN Patterns
If you need NULLs to match NULLs, you must handle it explicitly:
Now ORD-3 matches the "Guest" customer because we explicitly check for both sides being NULL. This pattern is useful when NULL represents a meaningful category like "guest" or "unknown."
How does the JOIN type affect rows with NULL keys?
Try writing a LEFT JOIN that also matches rows where both join keys are NULL.
> Complete this query to include rows where both join keys might be missing.
SELECT * FROM orders AS o LEFT JOIN customers AS c ON o.cust_id = c.cust_id ( o.cust_id c.cust_id )
The rule that NULL join keys never match is one of the most consequential behaviors in SQL because it causes silent row loss without any error or warning, making it one of the hardest classes of bug to detect in production.
When you discover that an INNER JOIN is returning fewer rows than expected, checking whether join key columns contain NULLs is the first diagnostic step and often the cause of the discrepancy.
The explicit OR (a IS NULL AND b IS NULL) pattern for matching NULL keys is a deliberate design choice, not a workaround, because it forces you to consciously decide whether two absent values represent the same entity in your domain.
COALESCE for defaults
Replace missing values with useful defaults
COALESCE returns the first non-NULL value from a list of arguments. It's essential for providing default values when data might be missing and for making query results more readable.
Understanding COALESCE
COALESCE returns the first non-NULL value from a list of arguments, providing a clean way to handle missing data.
Basic Syntax
COALESCE evaluates arguments left to right and returns the first non-NULL value it finds.
Simple Example
For each customer, this returns their nickname if they have one, otherwise their first_name, otherwise 'Unknown'. It cascades through options until finding a non-NULL value.
Watch how COALESCE walks through columns left to right, returning the first non-NULL value it finds for each row.
| id | phone | |
|---|---|---|
| C-1 | 555-1234 | NULL |
| C-2 | NULL | bob@mail.com |
| C-3 | NULL | NULL |
| C-4 | 555-9999 | dan@mail.com |
| col_0 | col_1 |
|---|---|
| C-1 | 555-1234 |
When both phone and email are NULL (C-3), COALESCE falls through to the fallback string "Unknown". This cascading logic ensures every row gets a usable value.
Practical Applications
Without COALESCE, the calculation price - discount would return NULL whenever discount is NULL. COALESCE(discount, 0) treats missing discounts as zero, ensuring the math works.
COALESCE in Aggregations
Remember that SUM of all NULLs returns NULL, not zero. COALESCE can fix this:
If a category has no inventory records (or all NULL quantities), the result shows 0 instead of NULL, making downstream calculations and reports cleaner.
COALESCE with JOINs
COALESCE is particularly useful for handling NULL values that appear after LEFT JOINs:
Orders without matching customers now show "Guest Order" instead of NULL, making reports more readable.
See how LEFT JOIN and COALESCE work together: unmatched orders receive "Guest Order" instead of a raw NULL in the display name.
For matched customers, COALESCE picks the nickname when available and falls back to the full name. For unmatched or NULL-keyed orders, the final fallback "Guest Order" ensures every row has a readable label.
> Complete this query to calculate the final price, treating missing discounts as zero.
SELECT product_id, price - (, ) AS final_price FROM products
COALESCE is the single most frequently used NULL-handling function in production SQL because providing a sensible default is almost always preferable to propagating NULL into reports and dashboards.
The cascading nature of COALESCE makes it ideal for priority hierarchies like "use the customer-provided name, then the system name, then a generic label" without needing nested CASE expressions.
Wrapping aggregate functions with COALESCE to substitute zero or an empty string is a standard pattern for making reports resilient to groups that have no data, preventing NULL from appearing in summary tables.
NULLIF for sentinel values
Convert fake values back to proper NULLs
NULLIF does the opposite of COALESCE: it returns NULL when two values are equal. This is useful for converting sentinel values (like 0, -1, or empty strings) into proper NULLs.
Understanding NULLIF
NULLIF returns NULL if two values are equal, otherwise it returns the first value. It is the inverse of COALESCE.
Basic Syntax
NULLIF compares two expressions and returns NULL when they match.
Sentinel Conversion
Legacy systems often use special values like 0, -1, or empty strings to mean "unknown." NULLIF converts these to proper NULLs:
The legacy system used 0 to mean "age unknown." NULLIF(age, 0) converts these zeros to NULL so they're handled correctly by aggregate functions.
Advanced NULLIF Patterns
A powerful pattern: use NULLIF to prevent division-by-zero errors:
NULLIF(total_orders, 0) returns NULL when total_orders is zero. Dividing by NULL returns NULL (instead of an error), gracefully handling products with no orders.
NULLIF-for-division pattern is one of the most useful SQL tricks. Memorize it: value / NULLIF(divisor, 0) safely handles zero divisors.Cleaning Empty Strings
Empty strings often represent missing data. NULLIF converts them to proper NULLs.
This converts empty strings to NULL, ensuring consistent handling of missing contact info.
> Complete this query to safely divide the total amount by quantity, avoiding division-by-zero errors.
SELECT product_id, total_amount / (, ) AS avg_value FROM transactions
The NULLIF division-by-zero pattern is one of those techniques that becomes second nature once you learn it because it elegantly converts a potential runtime error into a predictable NULL result that downstream code can handle.
NULLIF is most powerful when used at the beginning of a data pipeline to normalize sentinel values into proper NULLs, because every downstream query benefits from not having to repeat the sentinel-detection logic.
Combining NULLIF to clean sentinel values with COALESCE to supply defaults is the complete pattern for robust NULL handling, covering both the case where a value is absent and the case where it has been stored as a placeholder.
COALESCE and NULLIF patterns
Chain NULL handlers for clean output
COALESCE and NULLIF work together powerfully. A common pattern is to clean data with NULLIF, then provide defaults with COALESCE:
Combining Functions
Nesting NULLIF inside COALESCE creates a clean and default pattern that handles both empty strings and NULLs.
This first converts empty strings to NULL (via NULLIF), then cascades through options (via COALESCE) to find a usable name.
- Use COALESCE to provide meaningful defaults for NULL values
- Use NULLIF to convert sentinel values (0, -1, '') to proper NULLs
- Combine NULLIF and COALESCE to clean and default in one expression
- Remember that NULL join keys never match; handle explicitly if needed
- Don't assume 0 or empty string means NULL; data sources vary
- Don't forget that COALESCE returns NULL if ALL arguments are NULL
- Don't use COALESCE to hide data quality issues; fix them at the source
> Complete this query to display a fallback name when the display_name is empty or NULL.
SELECT product_id, ( (display_name, ''), 'Unknown' ) AS name FROM products
The COALESCE + NULLIF pattern is the standard way to handle both NULL and empty-string sentinel values in a single expression.
NULLIF is also useful for preventing division by zero: dividing by NULLIF(denominator, 0) returns NULL instead of crashing.
COALESCE can accept any number of arguments and returns the first non-NULL one, making it flexible for multi-level fallback chains.
> You are a reporting analyst at Recurly building a monthly revenue dashboard where failed API syncs leave payment amounts as empty strings or sentinel -1 values rather than NULL. Finance needs an accurate MRR figure by plan tier before the board meeting.
NULLIF converts sentinel -1 amounts and empty strings into proper NULLs before any aggregation runs.LEFT JOINs mean plan tiers with no successful payments appear with NULL revenue, not zero.COALESCE substitutes zero for NULL revenue so tier totals display correctly in the dashboard table.NULLIF and COALESCE in one expression cleans and defaults each payment column simultaneously.NULLs (empty strings, 0, -1) exist because legacy systems couldn't store real NULLsNULL to NULL doesn't yield true, so JOINs on NULL keys never matchLEFT JOIN produces NULLs for unmatched rows; use COALESCE to provide defaultsCOALESCE(a, b, c) returns the first non-NULL value in the listNULLIF(a, b) returns NULL if a = b; perfect for cleaning sentinel valuesNULLIF to prevent division by zero: x / NULLIF(y, 0)COALESCE(NULLIF(column, ''), 'default') cleans empty stringsNULLs when working with data from external sourcesCOALESCE: NULL won
COALESCE: NULL won't fix itself
- Category
- SQL
- Difficulty
- intermediate
- Duration
- 31 minutes
- Challenges
- 0 hands-on challenges
Topics covered: Sentinel values and fake NULLs, NULLs in INNER and LEFT JOINs, COALESCE for defaults, NULLIF for sentinel values, COALESCE and NULLIF patterns
Lesson Sections
- Sentinel values and fake NULLs (concepts: sqlNullif)
Sentinel Values Many software systems are forced to produce values even when the data isn't known. This happens for several reasons: When a system must produce a value but doesn't know the real answer, developers choose "placeholder" values. These sentinel values become embedded in the data and persist for years. Common Sentinel Values These placeholders may seem harmless, but they complicate every query that touches the column. Practical Implications Consider why these patterns emerge in actual
- NULLs in INNER and LEFT JOINs (concepts: sqlNullInJoins)
Understanding NULL in JOINs JOIN Type Behaviors LEFT JOIN and NULL Advanced JOIN Patterns
- COALESCE for defaults (concepts: sqlCoalesce)
Understanding COALESCE Basic Syntax Simple Example A common use case is selecting the best available name for display purposes. Practical Applications A common use case is providing default values for calculations: COALESCE in Aggregations COALESCE with JOINs
- NULLIF for sentinel values
Understanding NULLIF Basic Syntax Sentinel Conversion Advanced NULLIF Patterns Cleaning Empty Strings
- COALESCE and NULLIF patterns
Combining Functions