FULL OUTER JOIN is the reconciliation primitive every production data pipeline eventually needs. It lives at the boundary where two sources of truth meet: ledger and warehouse, stream and batch, vendor export and internal catalog. When the diff matters, this is the join that computes it in a single scan.
In our verified corpus FULL OUTER JOIN appears in 6% of SQL rounds, and the frequency spikes in pipeline-focused loops where reconciliation queries are the whole point. Where this fits in architecture: the night-of-day-1 job that compares the warehouse to the source system and produces the variance report. This page walks the syntax, five patterns, and the cost model that determines whether your engine hash-joins or merge-joins.
Questions using FULL OUTER
LEFT JOIN comparison
Scan per reconciliation
Architectural patterns covered
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
FULL JOIN and FULL OUTER JOIN parse identically. Architecturally they compile to the same plan node, usually a hash join with a two-sided spill strategy. Most style guides prefer the full spelling because reviewers on pipeline teams expect explicit intent when the query reconciles sources.
-- These two are identical
SELECT * FROM a FULL JOIN b ON a.id = b.a_id;
SELECT * FROM a FULL OUTER JOIN b ON a.id = b.a_id;
-- Full syntax
SELECT
COALESCE(left_table.key, right_table.key) AS unified_key,
left_table.column1,
right_table.column2
FROM left_table
FULL OUTER JOIN right_table
ON left_table.key = right_table.key;Interview note: Unlike LEFT JOIN, table order does not matter for FULL OUTER JOIN. Swapping the tables produces the same rows (columns just appear in different order). This is because FULL OUTER JOIN preserves unmatched rows from both sides.
Think of FULL OUTER JOIN as running a LEFT JOIN and a RIGHT JOIN at the same time, then merging the results. The LEFT JOIN half preserves all left-table rows. The RIGHT JOIN half preserves all right-table rows. Rows that match appear once (not duplicated). The result has three categories of rows.
Both tables have a row that satisfies the ON condition. All columns from both tables are populated. This is the same as what INNER JOIN returns.
The left table has a row with no matching row in the right table. Left columns are populated. Right columns are NULL. This is the same as the extra rows LEFT JOIN produces beyond INNER JOIN.
The right table has a row with no matching row in the left table. Right columns are populated. Left columns are NULL. This is the part that LEFT JOIN cannot capture. Only FULL OUTER JOIN or RIGHT JOIN preserves these rows.
-- customers: Alice (1), Bob (2), Carol (3)
-- orders: order for Alice (customer_id=1), order for Dave (customer_id=99)
SELECT c.name, o.order_id, o.customer_id
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;
-- Result:
-- Alice | 201 | 1 (matched)
-- Bob | NULL | NULL (left-only: no order)
-- Carol | NULL | NULL (left-only: no order)
-- NULL | 202 | 99 (right-only: orphan order)This comparison shows how each join type handles unmatched rows. FULL OUTER JOIN is the most permissive: it never drops a row from either table.
| Behavior | FULL OUTER | LEFT | INNER |
|---|---|---|---|
| Unmatched left rows | Kept, right columns filled with NULL | Kept, right columns filled with NULL | Dropped |
| Unmatched right rows | Kept, left columns filled with NULL | Dropped | Dropped |
| Result row count | Sum of matched + unmatched left + unmatched right | At least as many rows as the left table | Only matching rows |
| NULL generation | Both sides can produce NULLs | Only right-side columns produce NULLs | No NULLs generated by the join |
| Typical use case | Reconciliation, comparing two data sources | Preserve all entities from one table | Only entities with matching data on both sides |
From basic syntax to data reconciliation, difference detection, and the LEFT + RIGHT UNION workaround for databases that lack native support.
FULL OUTER JOIN returns every row from both tables. Where the ON condition matches, columns from both tables are populated. Where a left-table row has no match, right columns are NULL. Where a right-table row has no match, left columns are NULL. The keyword OUTER is optional: FULL JOIN and FULL OUTER JOIN are identical.
-- All customers and all orders, matched where possible
SELECT
c.customer_id,
c.name,
o.order_id,
o.amount
FROM customers c
FULL OUTER JOIN orders o
ON c.customer_id = o.customer_id;
-- Three types of rows in the result:
-- 1. Matched: both sides populated
-- 2. Left-only: order columns are NULL
-- 3. Right-only: customer columns are NULLFULL OUTER JOIN is the standard pattern for comparing two datasets. When migrating data, auditing pipelines, or validating ETL output, you join the source and target on a shared key. Rows that appear in both are matches. Rows with NULLs on one side are missing from that source. This pattern catches both missing records and records that exist in the target but not the source.
-- Compare source system vs data warehouse
SELECT
COALESCE(src.account_id, dwh.account_id) AS account_id,
src.balance AS source_balance,
dwh.balance AS warehouse_balance,
CASE
WHEN src.account_id IS NULL THEN 'Missing from source'
WHEN dwh.account_id IS NULL THEN 'Missing from warehouse'
WHEN src.balance != dwh.balance THEN 'Balance mismatch'
ELSE 'Match'
END AS reconciliation_status
FROM source_accounts src
FULL OUTER JOIN warehouse_accounts dwh
ON src.account_id = dwh.account_id;By filtering the FULL OUTER JOIN result to only rows where one side is NULL, you find records that exist in one table but not the other. This is an extension of the anti-join pattern. A LEFT JOIN anti-join finds rows missing from one side. FULL OUTER JOIN with IS NULL filters on both sides finds rows missing from either side in a single query.
-- Records that exist in only one table (not both)
SELECT
COALESCE(a.id, b.id) AS id,
CASE
WHEN a.id IS NULL THEN 'Only in B'
WHEN b.id IS NULL THEN 'Only in A'
END AS location
FROM table_a a
FULL OUTER JOIN table_b b ON a.id = b.id
WHERE a.id IS NULL OR b.id IS NULL;A FULL OUTER JOIN is logically equivalent to a LEFT JOIN unioned with the unmatched right-side rows. Some databases that do not support FULL OUTER JOIN (like older MySQL versions before 8.0) require this workaround. Understanding this equivalence helps you reason about what the join produces and how to simulate it when needed.
-- FULL OUTER JOIN equivalent using UNION ALL
-- (for databases that lack FULL OUTER JOIN support)
SELECT c.customer_id, c.name, o.order_id
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
UNION ALL
SELECT c.customer_id, c.name, o.order_id
FROM customers c
RIGHT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id IS NULL;When performing a FULL OUTER JOIN, the join key can be NULL on either side. COALESCE picks the non-NULL value, giving you a single unified key column. This is critical for downstream processing: without COALESCE, you would have two key columns, each NULL for different rows, making grouping and sorting unreliable.
-- Merge employee lists from two departments after a reorg
SELECT
COALESCE(old.emp_id, new.emp_id) AS emp_id,
old.department AS previous_dept,
new.department AS current_dept,
old.salary AS previous_salary,
new.salary AS current_salary
FROM old_roster old
FULL OUTER JOIN new_roster new
ON old.emp_id = new.emp_id
ORDER BY COALESCE(old.emp_id, new.emp_id);
-- Employees only in old_roster: left the company
-- Employees only in new_roster: new hires
-- Employees in both: transferred or stayedFULL OUTER JOIN produces NULLs on both sides of the result. This is different from LEFT JOIN, which only produces NULLs on the right side. Every column from either table can potentially be NULL in a FULL OUTER JOIN result.
The join key is NULL on the unmatched side. Use COALESCE(left.key, right.key) to produce a single non-NULL key column. Without this, GROUP BY and ORDER BY on the key column will not work correctly because NULL values are grouped separately.
If either table has NULL values in the join key, those rows will never match. NULL = NULL evaluates to UNKNOWN in SQL, not TRUE. This means rows with NULL keys always appear as unmatched, even if both tables have NULL key rows. This is correct behavior, but it can be surprising during reconciliation.
Any arithmetic with NULL produces NULL. When computing differences like source.amount - target.amount, either value could be NULL. Wrap both in COALESCE: COALESCE(src.amount, 0) - COALESCE(tgt.amount, 0). Test your query with unmatched rows from both directions.
These questions test join type selection, reconciliation skills, bidirectional anti-join patterns, and data quality debugging.
What they test:
Whether you understand the full spectrum of join behavior. INNER JOIN keeps only matches. LEFT JOIN keeps all left rows plus matches. FULL OUTER JOIN keeps all rows from both tables. The interviewer wants you to explain when each is appropriate and what NULLs mean in each case.
Approach:
Start with INNER JOIN: only matched rows survive. Then LEFT JOIN: all left rows survive, unmatched right columns are NULL. Then FULL OUTER JOIN: all rows from both tables survive, NULLs appear on whichever side has no match. Give a concrete example: 100 customers, 80 have orders, 5 orders reference deleted customers. INNER: 80 rows. LEFT: 100 rows. FULL OUTER: 105 rows.
What they test:
Practical data engineering skill. Reconciliation is a daily task in pipeline work. The interviewer checks whether you know the FULL OUTER JOIN reconciliation pattern and whether you can categorize rows as matched, missing from source, or missing from target.
Approach:
FULL OUTER JOIN source and target on the business key. Use CASE WHEN to classify each row: source IS NULL means missing from source, target IS NULL means missing from target, values differ means mismatch, everything else is a match. Mention COALESCE for the unified key. Note that you should also compare non-key columns to catch value-level discrepancies.
What they test:
Whether you recognize that a single LEFT JOIN anti-join only finds one direction of missing data. Finding both directions requires FULL OUTER JOIN with filtering on both NULL conditions. This tests your ability to choose the right join type for the problem.
Approach:
SELECT COALESCE(c.id, o.customer_id) AS id, c.name, o.order_id FROM customers c FULL OUTER JOIN orders o ON c.id = o.customer_id WHERE c.id IS NULL OR o.customer_id IS NULL. Explain that c.id IS NULL catches orphan orders and o.customer_id IS NULL catches customers without orders. Both problems are found in one pass.
What they test:
Data quality awareness. If the join key has inconsistencies (trailing spaces, different casing, type mismatches), rows that should match will not match. The FULL OUTER JOIN shows them as separate rows with NULLs on the opposite side. This is a real production debugging scenario.
Approach:
Check for data type mismatches: one side stores the key as VARCHAR, the other as INTEGER. Check for trailing whitespace: TRIM both sides. Check for case differences: LOWER both sides. Check for NULL keys: NULL = NULL is not TRUE in SQL, so rows with NULL keys never match. Run a diagnostic: SELECT src.key, dwh.key, LENGTH(src.key), LENGTH(dwh.key) to spot invisible differences.
Every production warehouse runs a FULL OUTER variance query at least once a day. Train the pattern here so you can draw the pipeline from memory on a whiteboard.