SQL COALESCE Function: NULL Handling for Data Engineers
NULL is the most misunderstood value in SQL. It is not zero, not an empty string, not false. It is unknown. COALESCE is the standard tool for taming NULLs, but using it correctly requires understanding how NULLs propagate through comparisons, arithmetic, aggregations, and joins. Interviewers test NULL handling because it separates candidates who have debugged real pipelines from those who have not.
Four NULL Rules Every Data Engineer Must Know
NULL = NULL is not TRUE
NULL represents unknown. Comparing two unknowns does not produce TRUE; it produces NULL. This means WHERE col = NULL returns zero rows. You must use WHERE col IS NULL. This is the single most common NULL mistake in interviews, and interviewers test it deliberately.
-- Wrong: returns no rows even if col has NULLs
SELECT * FROM users WHERE email = NULL;
-- Correct: use IS NULL
SELECT * FROM users WHERE email IS NULL;NULL in arithmetic produces NULL
Any arithmetic operation involving NULL returns NULL. price * NULL is NULL. revenue + NULL is NULL. This is how dirty data silently corrupts aggregations. A single NULL discount in a pricing table can make an entire calculated column NULL if you do not handle it.
-- If discount is NULL, total becomes NULL
SELECT price * (1 - discount) AS total FROM products;
-- Fix: COALESCE discount to 0
SELECT price * (1 - COALESCE(discount, 0)) AS total FROM products;Aggregates ignore NULL (except COUNT(*))
SUM, AVG, MIN, MAX all skip NULL values. COUNT(*) counts all rows. COUNT(column) counts non-NULL values only. The danger: AVG ignores NULLs, which changes the denominator. If 30 of 100 rows have NULL salary, AVG(salary) divides by 70, not 100.
-- These return different results if salary has NULLs
SELECT COUNT(*) FROM employees; -- all rows
SELECT COUNT(salary) FROM employees; -- non-NULL only
SELECT AVG(salary) FROM employees; -- average of non-NULL onlyNULL in NOT IN is a trap
If a NOT IN subquery returns any NULL value, the entire NOT IN predicate returns no rows. This is because NOT IN checks every value, and comparing anything to NULL yields NULL (not FALSE). Use NOT EXISTS instead, which handles NULLs correctly.
-- Dangerous: if subquery returns any NULL, zero rows come back
SELECT * FROM employees
WHERE id NOT IN (SELECT manager_id FROM employees);
-- Safe: NOT EXISTS handles NULLs correctly
SELECT * FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM employees m WHERE m.manager_id = e.id
);COALESCE Syntax
Every problem comes from a real interview report. Run code in your browser.
-- Basic syntax
COALESCE(expression1, expression2, ..., expressionN)
-- Returns first non-NULL: 'Alice'
SELECT COALESCE(NULL, NULL, 'Alice', 'Bob');
-- Practical: provide a default when display_name is NULL
SELECT COALESCE(display_name, username, 'Anonymous') AS shown_name
FROM users;Five COALESCE Patterns for Data Pipelines
Basic Default Value
Replace NULL with a meaningful default. This is the most common COALESCE pattern. Any nullable column that feeds into a calculation, display, or downstream system should be wrapped in COALESCE at the transformation layer.
SELECT
user_id,
COALESCE(display_name, username, email, 'Anonymous') AS shown_name,
COALESCE(phone, 'Not provided') AS phone
FROM users;Interview note: COALESCE takes multiple arguments and returns the first non-NULL. This is its advantage over IFNULL/NVL which only take two.
COALESCE After LEFT JOIN
LEFT JOINs produce NULLs for every right-side column when there is no match. COALESCE provides sensible defaults so downstream queries do not break. This is the most common COALESCE pattern in data pipelines.
SELECT
c.customer_id,
c.name,
COALESCE(SUM(o.amount), 0) AS total_spent,
COALESCE(COUNT(o.order_id), 0) AS order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.customer_id, c.name;Interview note: LEFT JOIN + GROUP BY + SUM often produces NULL for unmatched groups. Always wrap in COALESCE when the consuming system cannot handle NULL.
Multi-Source Fallback
When merging data from multiple sources, COALESCE picks the best available value. This is common in customer data platforms and master data management pipelines where each source has different coverage.
SELECT
user_id,
COALESCE(crm.email, app.email, support.email) AS best_email,
COALESCE(crm.phone, app.phone) AS best_phone
FROM users u
LEFT JOIN crm_data crm ON u.id = crm.user_id
LEFT JOIN app_data app ON u.id = app.user_id
LEFT JOIN support_data support ON u.id = support.user_id;Interview note: Each source has different coverage. COALESCE picks the first non-NULL, giving you the best available data without complex CASE WHEN logic.
Safe Division with NULLIF
Division by zero crashes queries. NULLIF converts the denominator to NULL when it equals zero, and COALESCE catches the resulting NULL to provide a default. This two-function combo is a standard defensive pattern.
SELECT
product_id,
revenue,
cost,
COALESCE(revenue / NULLIF(cost, 0), 0) AS margin_ratio
FROM financials;Interview note: NULLIF(cost, 0) returns NULL when cost is 0. The division becomes revenue / NULL which is NULL. COALESCE then catches that NULL and returns 0.
COALESCE for Incremental Merge
In MERGE/UPSERT operations, COALESCE selects whether to keep the existing value or take the new one. This preserves existing data when the incoming source has NULLs for fields it did not update.
MERGE INTO dim_customer AS target
USING staging AS source ON target.customer_id = source.customer_id
WHEN MATCHED THEN UPDATE SET
email = COALESCE(source.email, target.email),
phone = COALESCE(source.phone, target.phone),
updated_at = CURRENT_TIMESTAMP;Interview note: This preserves existing data when the source has NULLs. Interviewers test whether you know the difference between 'value was cleared' (empty string) and 'value was not provided' (NULL).
COALESCE vs IFNULL vs NVL vs ISNULL
| Function | Standard | Args | Engines |
|---|---|---|---|
| COALESCE(a, b, c) | ANSI SQL | Unlimited | All engines |
| IFNULL(a, b) | Non-standard | 2 only | MySQL, SQLite, BigQuery |
| NVL(a, b) | Oracle-specific | 2 only | Oracle, Snowflake |
| ISNULL(a, b) | SQL Server-specific | 2 only | SQL Server |
| NULLIF(a, b) | ANSI SQL | 2 only | All engines (returns NULL if a = b) |
7 COALESCE and NULL Interview Questions
Q1: What is the difference between COALESCE and IFNULL?
What they test: Portability awareness. COALESCE is ANSI standard and accepts multiple arguments. IFNULL is MySQL-specific and takes exactly two. Approach: State that COALESCE is standard SQL, works everywhere, and takes N arguments. IFNULL is shorthand for the two-argument case. Always prefer COALESCE in interviews.
Q2: Given a LEFT JOIN that produces NULL values, how do you handle them?
What they test: Understanding of JOIN semantics and NULL propagation. The interviewer wants COALESCE wrapping the right-table columns. Approach: Wrap right-table columns in COALESCE with appropriate defaults: COALESCE(SUM(b.amount), 0) for numeric aggregations, COALESCE(b.status, 'Unknown') for categorical fields.
Q3: Why does WHERE column != 'value' exclude NULL rows?
What they test: NULL three-valued logic. NULL != 'value' evaluates to NULL (not TRUE), so the row is excluded by WHERE. Approach: Explain three-valued logic: every comparison to NULL returns NULL, and WHERE only keeps TRUE rows. Fix: WHERE column != 'value' OR column IS NULL.
Q4: How does NULL affect AVG differently than SUM?
What they test: Both skip NULLs, but the impact differs. SUM skipping a NULL does not change the result. AVG skipping a NULL changes the denominator. Approach: Values 10, NULL, 20: SUM = 30 (correct). AVG = 15 (divides by 2, not 3). Use COALESCE(column, 0) before AVG if NULLs should count as zeros.
Q5: Write a query that shows revenue by region, treating missing regions as 'Unassigned'.
What they test: Practical COALESCE in GROUP BY. Without COALESCE, NULL regions form their own unlabeled group. Approach: SELECT COALESCE(region, 'Unassigned') AS region, SUM(revenue) FROM orders GROUP BY COALESCE(region, 'Unassigned'). The COALESCE must appear in both SELECT and GROUP BY.
Q6: Explain the COALESCE + NULLIF pattern for safe division.
What they test: Defensive SQL. This two-function combo prevents division by zero without conditional logic. Approach: NULLIF(denominator, 0) returns NULL when denominator is 0. COALESCE catches the resulting NULL. Pattern: COALESCE(numerator / NULLIF(denominator, 0), default_value).
Q7: When should you NOT use COALESCE?
What they test: Judgment. COALESCE hides NULLs, but sometimes NULLs carry meaning. Approach: Do not COALESCE when NULL is meaningful (e.g., end_date IS NULL means 'still active'). Do not COALESCE at the storage layer if you need the distinction between 'missing' and 'zero'. Apply COALESCE at the reporting or serving layer.
Frequently asked questions
What does COALESCE do in SQL?+
Is COALESCE the same as IFNULL?+
Does COALESCE affect query performance?+
Can I use COALESCE in GROUP BY?+
Why do data engineers care about NULL handling?+
Stop Losing Data to NULLs
NULL bugs are silent. Your pipeline runs, your tests pass, your dashboard loads. But the numbers are wrong because one column in one join had unexpected NULLs. Practice spotting and fixing NULL issues with real SQL execution.