Most candidates think LEFT JOIN "keeps all the data." The interviewer is checking whether you know LEFT JOIN drops rows the moment you put the right-table filter in the WHERE clause instead of the ON clause. That one mistake is the single most common LEFT JOIN bug in phone screens.
LEFT JOIN shows up in 17% of verified DE SQL rounds and FULL OUTER in 6%. The frequency is lower than INNER JOIN's 29%, but the bug surface is much larger. This page covers the syntax, the anti-join, and the three failure modes that silently corrupt outputs without throwing errors.
Questions using LEFT JOIN
FULL OUTER JOIN rate
Misplaced WHERE to break it
Silent failure modes
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
LEFT JOIN and LEFT OUTER JOIN are byte-for-byte identical to the parser. The OUTER keyword is cosmetic. Most teams drop it because five characters aren't worth the debate in code review. The real action is in the ON clause, which is where the interviewer is going to set a trap.
-- These two are identical
SELECT * FROM a LEFT JOIN b ON a.id = b.a_id;
SELECT * FROM a LEFT OUTER JOIN b ON a.id = b.a_id;
-- Full syntax
SELECT
left_table.column1,
right_table.column2
FROM left_table
LEFT JOIN right_table
ON left_table.key = right_table.key
[AND additional_conditions]
WHERE [post-join_filters];Interview note: The table listed first (after FROM) is the left table. The table after LEFT JOIN is the right table. Table order matters for LEFT JOIN but not for INNER JOIN. Swapping the tables in a LEFT JOIN changes the semantics entirely.
The difference only matters when unmatched rows exist. If every left-table row has at least one match, both join types return the same result. The divergence appears when some left rows have no corresponding right rows.
| Behavior | LEFT JOIN | INNER JOIN |
|---|---|---|
| Unmatched left rows | Kept, with NULLs for right columns | Dropped entirely |
| Unmatched right rows | Dropped (use FULL OUTER JOIN to keep both) | Dropped |
| Result row count | At least as many rows as the left table (more if right has duplicates) | Only matching rows |
| NULL generation | Yes, for right-side columns where no match exists | No NULLs generated by the join itself |
| Use case | Preserve all entities even without related data | Only entities with matching data |
-- customers: Alice (id=1), Bob (id=2), Carol (id=3)
-- orders: order for Alice (customer_id=1), order for Bob (customer_id=2)
-- LEFT JOIN: 3 rows (Carol has NULLs for order columns)
SELECT c.name, o.order_id
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
-- Alice | 101
-- Bob | 102
-- Carol | NULL <-- preserved by LEFT JOIN
-- INNER JOIN: 2 rows (Carol dropped)
SELECT c.name, o.order_id
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
-- Alice | 101
-- Bob | 102From basic entity preservation to the anti-join and the critical ON vs WHERE distinction. These patterns cover the full range of LEFT JOIN usage in interviews and production.
LEFT JOIN returns all rows from the left table. For rows that match the ON condition, columns from the right table are populated. For rows that do not match, right-side columns are filled with NULL. The keyword OUTER is optional: LEFT JOIN and LEFT OUTER JOIN are identical.
-- All customers, with their orders if any
SELECT
c.customer_id,
c.name,
o.order_id,
o.amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
-- Customers with no orders will have:
-- order_id = NULL, amount = NULL
-- They are NOT dropped from the resultThe anti-join finds rows in the left table that have no match in the right table. LEFT JOIN produces NULLs for unmatched rows. Adding WHERE right_table.key IS NULL filters to only those unmatched rows. This is one of the most powerful and commonly tested patterns in SQL interviews. It answers questions like 'which customers have never ordered' and 'which products have no reviews.'
-- Customers who have never placed an order
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.customer_id IS NULL;
-- Alternative: NOT EXISTS (often same performance)
SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
);
-- Alternative: NOT IN (careful with NULLs)
SELECT customer_id, name
FROM customers
WHERE customer_id NOT IN (
SELECT customer_id FROM orders WHERE customer_id IS NOT NULL
);Combining LEFT JOIN with GROUP BY and aggregate functions is a common reporting pattern. The LEFT JOIN preserves all entities from the left table. The aggregation summarizes the related data. COALESCE handles the NULLs generated by unmatched rows, converting them to zeros or default values for clean reporting.
-- All departments with their employee count and total salary
SELECT
d.department_name,
COUNT(e.employee_id) AS headcount,
COALESCE(SUM(e.salary), 0) AS total_payroll
FROM departments d
LEFT JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_name;
-- Departments with no employees show headcount = 0, total_payroll = 0
-- COUNT(e.employee_id) correctly returns 0 for NULLs
-- SUM(e.salary) returns NULL for empty groups, so COALESCE is neededYou can chain multiple LEFT JOINs to bring in data from several related tables while preserving all rows from the leftmost table. Each LEFT JOIN independently adds columns. If any join has no match, those columns are NULL. This pattern builds wide denormalized views for analytics and reporting.
-- Customer profile with latest order and support ticket count
SELECT
c.customer_id,
c.name,
c.email,
lo.latest_order_date,
lo.latest_amount,
COALESCE(t.ticket_count, 0) AS tickets
FROM customers c
LEFT JOIN (
SELECT DISTINCT ON (customer_id)
customer_id,
order_date AS latest_order_date,
amount AS latest_amount
FROM orders
ORDER BY customer_id, order_date DESC
) lo ON c.customer_id = lo.customer_id
LEFT JOIN (
SELECT customer_id, COUNT(*) AS ticket_count
FROM support_tickets
GROUP BY customer_id
) t ON c.customer_id = t.customer_id;This is a critical distinction that interviewers test frequently. Conditions in the ON clause filter the right table before the join. Conditions in the WHERE clause filter the joined result after the join. For LEFT JOINs, moving a right-table filter from ON to WHERE can eliminate rows that should have been preserved, effectively turning the LEFT JOIN into an INNER JOIN.
-- CORRECT: filter in ON clause (preserves unmatched customers)
SELECT c.customer_id, c.name, o.order_id
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
AND o.status = 'completed';
-- Customers with no completed orders: order_id = NULL (preserved)
-- WRONG: filter in WHERE clause (drops unmatched customers)
SELECT c.customer_id, c.name, o.order_id
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'completed';
-- Customers with no completed orders are DROPPED
-- because o.status is NULL for them, and NULL != 'completed'These three mistakes produce queries that run successfully but return incorrect results. Each one has cost real candidates real offers.
The most common LEFT JOIN bug. Adding WHERE right_table.column = 'value' drops all rows where the right table has no match, because the column is NULL for those rows, and NULL does not equal anything. This silently converts the LEFT JOIN into an INNER JOIN. The query runs without errors. The results look plausible. The bug goes unnoticed until someone asks why certain rows are missing.
Fix: Move right-table filter conditions to the ON clause: LEFT JOIN ... ON key = key AND right_column = 'value'. This preserves unmatched left rows while filtering the right table before joining.
COUNT(*) counts all rows, including those with NULLs from unmatched joins. If you LEFT JOIN customers to orders and GROUP BY customer, COUNT(*) returns 1 for customers with no orders (because the LEFT JOIN produces one row with NULLs). COUNT(order_id) returns 0, which is the correct count.
Fix: Use COUNT(right_table.column) when counting related entities after a LEFT JOIN. COUNT(o.order_id) correctly returns 0 for customers with no orders.
LEFT JOINs generate NULLs. Any arithmetic with NULL produces NULL: salary + NULL = NULL. Any comparison with NULL returns UNKNOWN: NULL > 0 is not TRUE. Downstream calculations, CASE expressions, and WHERE conditions all need to account for these generated NULLs. Forgetting this produces incorrect totals, missing rows in filters, and broken reports.
Fix: Wrap right-side columns in COALESCE: COALESCE(o.amount, 0) for arithmetic. Use IS NULL / IS NOT NULL for comparisons. Test your query with entities that have no right-table matches to verify NULL handling.
These questions test join mechanics, anti-join patterns, fan-out understanding, and the ON vs WHERE distinction.
What they test:
Fundamental join understanding. INNER JOIN drops unmatched rows from both sides. LEFT JOIN keeps all left-table rows, filling NULLs for unmatched right-side columns. The interviewer wants a concrete example, not just the definition. A table of customers with 100 rows and an orders table missing some customers: INNER JOIN returns fewer rows than LEFT JOIN because customers without orders are dropped.
Approach:
Give the example: 100 customers, 80 have orders. INNER JOIN returns rows for the 80 customers who have orders. LEFT JOIN returns rows for all 100 customers; the 20 without orders have NULL in the order columns. Explain when each is appropriate: INNER JOIN when you only need entities with related data, LEFT JOIN when you need all entities regardless.
What they test:
The anti-join pattern. This is one of the most frequently tested join patterns across all SQL interview levels. The interviewer checks whether you use LEFT JOIN + WHERE IS NULL, NOT EXISTS, or NOT IN. All three approaches work, but LEFT JOIN + IS NULL is the most visual and commonly expected in DE interviews.
Approach:
SELECT p.product_id, p.name FROM products p LEFT JOIN order_items oi ON p.product_id = oi.product_id WHERE oi.product_id IS NULL. Explain why the IS NULL check must be on the join key from the right table. Mention NOT EXISTS as an alternative. Warn about NOT IN with NULLs: if the subquery returns any NULL, NOT IN returns empty results.
What they test:
Understanding of join fan-out. LEFT JOIN guarantees at least one row per left-table row. But if the right table has multiple matching rows (the join key is not unique on the right side), the left row is duplicated for each match. This is a fan-out. 100 customers joining to an orders table where some customers have 5 orders produces up to 500 rows.
Approach:
Identify the cause: the join key is not unique in the right table. Example: customer_id appears multiple times in orders. Each order creates a separate joined row. Solutions: pre-aggregate the right table before joining (SELECT customer_id, COUNT(*) ... GROUP BY), or use a subquery with ROW_NUMBER to pick one row per key, or use DISTINCT ON.
What they test:
ON vs WHERE semantics for outer joins. The ON clause filters before the join: rows that do not meet the condition are excluded from the right table, so the left row appears with NULLs. The WHERE clause filters after the join: rows where the right-side column is NULL (no match) are filtered out entirely. This effectively converts the LEFT JOIN to an INNER JOIN for that condition.
Approach:
Walk through an example. LEFT JOIN orders ON customers.id = orders.customer_id AND orders.year = 2024. Customers without 2024 orders still appear (with NULL). Move the year filter to WHERE: WHERE orders.year = 2024. Now customers without 2024 orders are dropped because orders.year is NULL, which fails the WHERE check. Conclusion: right-table filters belong in ON for LEFT JOINs.
Senior engineers still fail LEFT JOIN questions because they trust the word "outer" and don't check their WHERE clause. Practice the anti-join until the filter placement is muscle memory, not guesswork.