SQL COALESCE Function: NULL Handling for Data Engineers

NULL is the most misunderstood value in SQL. It is not zero, not an empty string, not false. It is unknown. COALESCE is the standard tool for taming NULLs, but using it correctly requires understanding how NULLs propagate through comparisons, arithmetic, aggregations, and joins. Interviewers test NULL handling because it separates candidates who have debugged real pipelines from those who have not.

Four NULL Rules Every Data Engineer Must Know

Before learning COALESCE syntax, internalize these rules. Every NULL bug in production traces back to violating one of them.

NULL = NULL is not TRUE

NULL represents unknown. Comparing two unknowns does not produce TRUE; it produces NULL. This means WHERE col = NULL returns zero rows. You must use WHERE col IS NULL. This is the single most common NULL mistake in interviews, and interviewers test it deliberately.

-- Wrong: returns no rows even if col has NULLs
SELECT * FROM users WHERE email = NULL;

-- Correct: use IS NULL
SELECT * FROM users WHERE email IS NULL;

NULL in arithmetic produces NULL

Any arithmetic operation involving NULL returns NULL. price * NULL is NULL. revenue + NULL is NULL. This is how dirty data silently corrupts aggregations. A single NULL discount in a pricing table can make an entire calculated column NULL if you do not handle it.

-- If discount is NULL, total becomes NULL
SELECT price * (1 - discount) AS total FROM products;

-- Fix: COALESCE discount to 0
SELECT price * (1 - COALESCE(discount, 0)) AS total FROM products;

Aggregates ignore NULL (except COUNT(*))

SUM, AVG, MIN, MAX all skip NULL values. COUNT(*) counts all rows. COUNT(column) counts non-NULL values only. The danger: AVG ignores NULLs, which changes the denominator. If 30 of 100 rows have NULL salary, AVG(salary) divides by 70, not 100.

-- These return different results if salary has NULLs
SELECT COUNT(*) FROM employees;         -- all rows
SELECT COUNT(salary) FROM employees;    -- non-NULL only
SELECT AVG(salary) FROM employees;      -- average of non-NULL only

NULL in NOT IN is a trap

If a NOT IN subquery returns any NULL value, the entire NOT IN predicate returns no rows. This is because NOT IN checks every value, and comparing anything to NULL yields NULL (not FALSE). Use NOT EXISTS instead, which handles NULLs correctly.

-- Dangerous: if subquery returns any NULL, zero rows come back
SELECT * FROM employees
WHERE id NOT IN (SELECT manager_id FROM employees);

-- Safe: NOT EXISTS handles NULLs correctly
SELECT * FROM employees e
WHERE NOT EXISTS (
  SELECT 1 FROM employees m WHERE m.manager_id = e.id
);

COALESCE Syntax

COALESCE evaluates arguments left to right and returns the first non-NULL value. It is the standard SQL null-coalescing function, supported by every major engine.

-- Basic syntax
COALESCE(expression1, expression2, ..., expressionN)

-- Returns first non-NULL: 'Alice'
SELECT COALESCE(NULL, NULL, 'Alice', 'Bob');

-- Practical: provide a default when display_name is NULL
SELECT COALESCE(display_name, username, 'Anonymous') AS shown_name
FROM users;

Five COALESCE Patterns for Data Pipelines

These patterns cover 90% of COALESCE usage in production pipelines and interviews. Each solves a specific NULL problem.

Basic Default Value

Replace NULL with a meaningful default. This is the most common COALESCE pattern. Any nullable column that feeds into a calculation, display, or downstream system should be wrapped in COALESCE at the transformation layer.

SELECT
  user_id,
  COALESCE(display_name, username, email, 'Anonymous') AS shown_name,
  COALESCE(phone, 'Not provided') AS phone
FROM users;

Interview note: COALESCE takes multiple arguments and returns the first non-NULL. This is its advantage over IFNULL/NVL which only take two.

COALESCE After LEFT JOIN

LEFT JOINs produce NULLs for every right-side column when there is no match. COALESCE provides sensible defaults so downstream queries do not break. This is the most common COALESCE pattern in data pipelines.

SELECT
  c.customer_id,
  c.name,
  COALESCE(SUM(o.amount), 0) AS total_spent,
  COALESCE(COUNT(o.order_id), 0) AS order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.customer_id, c.name;

Interview note: LEFT JOIN + GROUP BY + SUM often produces NULL for unmatched groups. Always wrap in COALESCE when the consuming system cannot handle NULL.

Multi-Source Fallback

When merging data from multiple sources, COALESCE picks the best available value. This is common in customer data platforms and master data management pipelines where each source has different coverage.

SELECT
  user_id,
  COALESCE(crm.email, app.email, support.email) AS best_email,
  COALESCE(crm.phone, app.phone) AS best_phone
FROM users u
LEFT JOIN crm_data crm ON u.id = crm.user_id
LEFT JOIN app_data app ON u.id = app.user_id
LEFT JOIN support_data support ON u.id = support.user_id;

Interview note: Each source has different coverage. COALESCE picks the first non-NULL, giving you the best available data without complex CASE WHEN logic.

Safe Division with NULLIF

Division by zero crashes queries. NULLIF converts the denominator to NULL when it equals zero, and COALESCE catches the resulting NULL to provide a default. This two-function combo is a standard defensive pattern.

SELECT
  product_id,
  revenue,
  cost,
  COALESCE(revenue / NULLIF(cost, 0), 0) AS margin_ratio
FROM financials;

Interview note: NULLIF(cost, 0) returns NULL when cost is 0. The division becomes revenue / NULL which is NULL. COALESCE then catches that NULL and returns 0.

COALESCE for Incremental Merge

In MERGE/UPSERT operations, COALESCE selects whether to keep the existing value or take the new one. This preserves existing data when the incoming source has NULLs for fields it did not update.

MERGE INTO dim_customer AS target
USING staging AS source ON target.customer_id = source.customer_id
WHEN MATCHED THEN UPDATE SET
  email = COALESCE(source.email, target.email),
  phone = COALESCE(source.phone, target.phone),
  updated_at = CURRENT_TIMESTAMP;

Interview note: This preserves existing data when the source has NULLs. Interviewers test whether you know the difference between 'value was cleared' (empty string) and 'value was not provided' (NULL).

COALESCE vs IFNULL vs NVL vs ISNULL

Interviewers sometimes ask you to compare these. The key insight: COALESCE is ANSI standard and works everywhere. The others are engine-specific shortcuts.

Function	Standard	Args	Engines
COALESCE(a, b, c)	ANSI SQL	Unlimited	All engines
IFNULL(a, b)	Non-standard	2 only	MySQL, SQLite, BigQuery
NVL(a, b)	Oracle-specific	2 only	Oracle, Snowflake
ISNULL(a, b)	SQL Server-specific	2 only	SQL Server
NULLIF(a, b)	ANSI SQL	2 only	All engines (returns NULL if a = b)

7 COALESCE and NULL Interview Questions

Q1: What is the difference between COALESCE and IFNULL?

What they test: Portability awareness. COALESCE is ANSI standard and accepts multiple arguments. IFNULL is MySQL-specific and takes exactly two. Approach: State that COALESCE is standard SQL, works everywhere, and takes N arguments. IFNULL is shorthand for the two-argument case. Always prefer COALESCE in interviews.

Q2: Given a LEFT JOIN that produces NULL values, how do you handle them?

What they test: Understanding of JOIN semantics and NULL propagation. The interviewer wants COALESCE wrapping the right-table columns. Approach: Wrap right-table columns in COALESCE with appropriate defaults: COALESCE(SUM(b.amount), 0) for numeric aggregations, COALESCE(b.status, 'Unknown') for categorical fields.

Q3: Why does WHERE column != 'value' exclude NULL rows?

What they test: NULL three-valued logic. NULL != 'value' evaluates to NULL (not TRUE), so the row is excluded by WHERE. Approach: Explain three-valued logic: every comparison to NULL returns NULL, and WHERE only keeps TRUE rows. Fix: WHERE column != 'value' OR column IS NULL.

Q4: How does NULL affect AVG differently than SUM?

What they test: Both skip NULLs, but the impact differs. SUM skipping a NULL does not change the result. AVG skipping a NULL changes the denominator. Approach: Values 10, NULL, 20: SUM = 30 (correct). AVG = 15 (divides by 2, not 3). Use COALESCE(column, 0) before AVG if NULLs should count as zeros.

Q5: Write a query that shows revenue by region, treating missing regions as 'Unassigned'.

What they test: Practical COALESCE in GROUP BY. Without COALESCE, NULL regions form their own unlabeled group. Approach: SELECT COALESCE(region, 'Unassigned') AS region, SUM(revenue) FROM orders GROUP BY COALESCE(region, 'Unassigned'). The COALESCE must appear in both SELECT and GROUP BY.

Q6: Explain the COALESCE + NULLIF pattern for safe division.

What they test: Defensive SQL. This two-function combo prevents division by zero without conditional logic. Approach: NULLIF(denominator, 0) returns NULL when denominator is 0. COALESCE catches the resulting NULL. Pattern: COALESCE(numerator / NULLIF(denominator, 0), default_value).

Q7: When should you NOT use COALESCE?

What they test: Judgment. COALESCE hides NULLs, but sometimes NULLs carry meaning. Approach: Do not COALESCE when NULL is meaningful (e.g., end_date IS NULL means 'still active'). Do not COALESCE at the storage layer if you need the distinction between 'missing' and 'zero'. Apply COALESCE at the reporting or serving layer.

Frequently asked questions

What does COALESCE do in SQL?+

COALESCE takes a list of expressions and returns the first one that is not NULL. If all are NULL, it returns NULL. COALESCE(a, b, c) checks a first; if NULL, checks b; if NULL, returns c. It is the standard SQL way to provide default values for nullable columns.

Is COALESCE the same as IFNULL?+

They produce the same result for two arguments. But COALESCE is ANSI standard and works everywhere, while IFNULL is MySQL-specific. COALESCE also accepts more than two arguments: COALESCE(a, b, c) has no IFNULL equivalent without nesting.

Does COALESCE affect query performance?+

COALESCE is a lightweight expression with negligible overhead. However, COALESCE in a WHERE clause can prevent index usage. WHERE COALESCE(col, 0) > 10 may not use an index on col. Rewrite as WHERE col > 10 OR (col IS NULL AND 0 > 10) if performance matters.

Can I use COALESCE in GROUP BY?+

Yes. COALESCE(region, 'Unknown') in GROUP BY treats NULL regions as 'Unknown' and groups them together. The COALESCE expression must appear identically in both SELECT and GROUP BY.

Why do data engineers care about NULL handling?+

Because NULL propagation is the top source of silent data quality bugs. A single NULL in a multiplication chain makes the entire result NULL. Pipelines can run, produce output, and still be wrong because NULLs were not handled. Interviewers test this because it reveals real debugging experience.

02 / Why practice

Stop Losing Data to NULLs

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Practice NULL Handling