SQL JOIN Practice for Data Engineers (2026)
JOINs are the second most-tested SQL category after aggregation. INNER JOIN is the most common join type by far, with LEFT JOIN close behind. This guide covers all seven join types — INNER, LEFT, FULL OUTER, self, inequality, CROSS, LATERAL — with code, practice problems, and the six common traps that cost interview rounds.
JOIN types ranked by interview frequency
Seven join types. Spend prep time proportional to frequency — INNER and LEFT dominate, the rest cover specific patterns.
| Join type | What it does | Interview frequency | When to reach for it |
|---|---|---|---|
| INNER JOIN | Both tables match | ~55% of interview JOIN questions | Most common; default join |
| LEFT JOIN | Keep all left rows | ~30% | Critical for anti-joins and zero-handling |
| FULL OUTER JOIN | Keep all from both | ~3% | Reconciliation / data diffing |
| Self-join | Table joined to itself | ~7% | Hierarchies, consecutive events, pairs |
| Inequality join | ON uses <, >, BETWEEN | ~3% | Date ranges, attribution windows, intervals |
| CROSS JOIN | Cartesian product | ~2% | Date spines, scaffolds, gap-filling |
| LATERAL JOIN | Subquery per row | <1% | Postgres-specific; advanced top-N per group |
INNER JOIN (~55% of questions)
-- INNER JOIN: returns ONLY rows that match in both tables.
-- Default and most common. Interviewers test that you understand
-- unmatched rows are silently dropped.
-- Total spend per customer (matched customers only)
SELECT c.name, SUM(o.amount) AS total_spend
FROM customers c
INNER JOIN orders o ON o.customer_id = c.id
GROUP BY c.name
ORDER BY total_spend DESC;
-- Three-way JOIN to find active paying users
SELECT u.id, u.email
FROM users u
INNER JOIN subscriptions s ON s.user_id = u.id
INNER JOIN payments p ON p.subscription_id = s.id
WHERE s.status = 'active'
AND p.paid_at >= CURRENT_DATE - INTERVAL '30 days';
-- Practice problems (5):
-- 1. Join orders to customers and calculate total spend per customer
-- 2. Find products ordered at least once (filter never-ordered products)
-- 3. Match employees to department names, filter by hire_date
-- 4. Three-way join: users + subscriptions + payments for active payers
-- 5. Find pairs of students enrolled in the same course (self-join via INNER)Default join type. Tests whether you understand that unmatched rows are silently dropped. Most candidates write this fluently; the trap is forgetting to verify row counts after multi-table joins.
LEFT JOIN (~30% of questions)
-- LEFT JOIN: returns ALL rows from the left table.
-- NULLs fill in where the right table has no match.
-- The most interview-relevant join type.
-- Customers who NEVER ordered (the anti-join pattern)
SELECT c.id, c.name
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.order_id IS NULL;
-- WHERE IS NULL converts LEFT JOIN to "left table only, no match"
-- Revenue per category INCLUDING categories with $0
SELECT cat.name, COALESCE(SUM(o.amount), 0) AS revenue
FROM categories cat
LEFT JOIN products p ON p.category_id = cat.id
LEFT JOIN order_items oi ON oi.product_id = p.id
LEFT JOIN orders o ON o.id = oi.order_id
GROUP BY cat.id, cat.name;
-- COALESCE turns NULL → 0 for zero-revenue categories
-- Practice problems (5):
-- 1. Find customers who have never placed an order
-- 2. List products with most recent review, INCLUDING products with zero reviews
-- 3. Every employee + manager name, including the CEO (no manager)
-- 4. Revenue per category including categories with $0
-- 5. Users who signed up but never completed onboardingThe most interview-relevant join type. The anti-join pattern (LEFT JOIN + WHERE IS NULL) appears in nearly every interview at every level.
Six JOIN traps that cost interviews
Each trap is independent. Mention NULL handling and row count verification proactively — they're signal-positive whether the interviewer asks or not.
LEFT JOIN + WHERE on right table = INNER JOIN
Adding WHERE right_table.column = 'value' after a LEFT JOIN converts it into an INNER JOIN. NULLs from unmatched rows fail the WHERE check and get dropped. Move the filter into the ON clause to keep it a true LEFT JOIN. This is the single most common JOIN interview trap.
JOIN that silently duplicates rows (fan-out)
If your join key is not unique on one side, each row on the unique side gets duplicated for every match on the 'many' side. Joining orders to order_items gives you one row per item, not one row per order. Interviewers ask you to count orders after a join to see if you catch the fan-out.
NULL != NULL in JOIN conditions
Joining on a nullable column drops rows where either side is NULL because NULL = NULL evaluates to NULL (falsy in WHERE/ON). If you need to match NULLs, use IS NOT DISTINCT FROM (Postgres) or COALESCE(col, sentinel). This trap costs candidates points who don't flag NULL handling.
Ambiguous column references in 3+ table joins
When you join three or more tables, always alias your tables and qualify every column. Interviewers watch for ambiguous references. If two tables both have an 'id' column and you write SELECT id, the query fails with 'ambiguous column'. Alias = professionalism signal.
Aggregate then join, OR join then aggregate
When joining a 1-table to a 1-to-many table and aggregating, the order matters for performance AND correctness. Pre-aggregate the many side to avoid fan-out, then join: smaller intermediate result, no duplicate inflation. Joining then aggregating works but is slower and easier to get wrong on multi-table joins.
FULL OUTER JOIN with COALESCE for the join key
FULL OUTER preserves IDs from both sides; SELECT a.id won't return rows that only exist in B. Always SELECT COALESCE(a.id, b.id) to get the union of join keys. Forgetting this is the most common FULL OUTER mistake and causes silent data loss in reconciliation queries.
Self-joins (~7% of questions)
-- SELF-JOIN: a table joined to itself.
-- For hierarchies, pairs, and consecutive events.
-- Employee + manager from a single employees table
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
-- Customers who ordered on CONSECUTIVE days
SELECT DISTINCT o1.customer_id
FROM orders o1
INNER JOIN orders o2
ON o1.customer_id = o2.customer_id
AND o2.order_date = o1.order_date + INTERVAL '1 day';
-- Employees who earn MORE than their manager
SELECT e.name, e.salary, m.name AS manager, m.salary AS manager_salary
FROM employees e
INNER JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;
-- Practice problems (5):
-- 1. Each employee and their direct manager from a single employees table
-- 2. Customers who placed orders on consecutive days
-- 3. All pairs of products in the same category with price diff under $10
-- 4. Flights where arrival city = departure city of another flight (connections)
-- 5. Employees who earn more than their managerTable joined to itself. Solve hierarchies (employee/manager), consecutive events, and pairs within a single table. Self-joins feel unnatural at first but solve a specific class no other join can.
Inequality joins (~3% of questions)
-- INEQUALITY JOIN: condition uses <, >, BETWEEN instead of =.
-- Date ranges, attribution windows, interval matching.
-- Match each sale to the price tier active on that date
SELECT s.sale_id, s.sale_date, pt.tier_name, s.amount
FROM sales s
INNER JOIN price_tiers pt
ON s.sale_date BETWEEN pt.valid_from AND pt.valid_to;
-- Events within 30 minutes of a user signup
SELECT e.event_id, u.user_id, e.event_time
FROM users u
INNER JOIN events e
ON e.user_id = u.user_id
AND e.event_time BETWEEN u.signup_at AND u.signup_at + INTERVAL '30 minutes';
-- Attribute conversions to the most recent marketing touch
SELECT c.conversion_id, m.touch_id, c.conversion_time, m.touch_time
FROM conversions c
LEFT JOIN LATERAL (
SELECT *
FROM marketing_touches mt
WHERE mt.user_id = c.user_id
AND mt.touch_time <= c.conversion_time
ORDER BY mt.touch_time DESC
LIMIT 1
) m ON true;
-- Practice problems (5):
-- 1. Match each sale to the price tier active on that date
-- 2. Events within 30 minutes of a user signup
-- 3. Attribute conversions to the most recent marketing touch
-- 4. Salary history records with overlapping date ranges
-- 5. Orders matched to the exchange rate valid on the order dateJOIN with <, >, BETWEEN instead of =. Date ranges, attribution windows, interval matching. Many candidates have never written one.
Four worked-example patterns
Specific patterns that appear in actual interview questions. Each is a recognizable structural move you can deploy when you see the underlying problem shape.
Customer LEFT JOIN orders with COALESCE
Given customers and orders tables, find each customer's name, order count, and total spend. INCLUDE customers who never ordered. Query: LEFT JOIN customers to orders, GROUP BY customer, COUNT(o.order_id), COALESCE(SUM(o.amount), 0). The LEFT JOIN keeps every customer; COALESCE turns NULL into 0 for the no-orders case. Without COALESCE, the report shows NULL where the reader expects 0 — silent bug.
Self-join for consecutive-day orders
Find customers who ordered on two consecutive days. Self-join orders to itself on customer_id and o2.order_date = o1.order_date + 1 day. SELECT DISTINCT o1.customer_id. The +1 day inequality is the structural insight — self-joins for temporal patterns hinge on the date arithmetic in the join condition.
Inequality join for price-tier matching
Match each sale to the price tier that was active on the sale date. Tiers have valid_from and valid_to columns. JOIN ON sale_date BETWEEN valid_from AND valid_to. The join produces one row per (sale, matching tier) — guard against overlapping tiers by validating the data first.
CROSS JOIN for date spine + gap fill
Generate a report row for every (day, region) combination even when no sales happened. CROSS JOIN generate_series with regions to create the scaffold. LEFT JOIN sales onto the scaffold. COALESCE revenue to 0. This pattern guarantees the dashboard shows every period regardless of activity.
CROSS JOIN (~2% of questions)
-- CROSS JOIN: Cartesian product. Every row × every row.
-- For generating combinations, date spines, gap-filling.
-- Date spine × regions for reporting scaffold
SELECT d.dt, r.region_name, COALESCE(s.revenue, 0) AS revenue
FROM generate_series(DATE '2024-01-01', DATE '2024-12-31', INTERVAL '1 day') d(dt)
CROSS JOIN regions r
LEFT JOIN daily_sales s ON s.day = d.dt AND s.region_id = r.id;
-- All product-store combinations (find missing stocks)
SELECT p.product_id, s.store_id, COALESCE(i.qty, 0) AS qty
FROM products p
CROSS JOIN stores s
LEFT JOIN inventory i
ON i.product_id = p.product_id
AND i.store_id = s.store_id
WHERE i.qty IS NULL OR i.qty = 0;
-- Round-robin tournament: all team matchups
SELECT t1.name AS home, t2.name AS away
FROM teams t1
CROSS JOIN teams t2
WHERE t1.id < t2.id; -- avoid (A,A) and duplicate (A,B)/(B,A)
-- Practice problems (5):
-- 1. All product-store combinations to find missing stocks
-- 2. Date spine × regions reporting scaffold
-- 3. Round-robin tournament matchups
-- 4. Pair every user with every survey question (find unanswered)
-- 5. Multiplication table from two integer sequencesCartesian product. Use for date spines, gap filling, and combination generation. Small tables only — a 1M × 1M cross join produces a trillion rows.
FULL OUTER JOIN (~3% of questions)
-- FULL OUTER JOIN: keeps all rows from BOTH sides.
-- For reconciliation and data diffing.
-- Compare two source systems for data quality
SELECT
COALESCE(s.id, w.id) AS id,
s.amount AS source_amount,
w.amount AS warehouse_amount,
CASE
WHEN s.id IS NULL THEN 'missing in source'
WHEN w.id IS NULL THEN 'missing in warehouse'
WHEN s.amount <> w.amount THEN 'amount mismatch'
ELSE 'match'
END AS diff_type
FROM source_orders s
FULL OUTER JOIN warehouse_orders w ON s.id = w.id
WHERE s.id IS NULL OR w.id IS NULL OR s.amount <> w.amount;
-- Find IDs in A but not B (and vice versa) in one query
SELECT
COALESCE(a.id, b.id) AS id,
CASE WHEN a.id IS NULL THEN 'only in B'
WHEN b.id IS NULL THEN 'only in A'
ELSE 'both' END AS membership
FROM table_a a
FULL OUTER JOIN table_b b ON a.id = b.id;Keeps all rows from both sides. Most common use: reconciliation between two source systems. Always SELECT COALESCE(a.id, b.id) for the join key — forgetting this is the most common FULL OUTER mistake.
LATERAL JOIN (Postgres-specific, <1%)
-- LATERAL JOIN (Postgres): subquery executed per row of the outer table.
-- Advanced pattern for "top N per group" without window functions.
-- Often produces cleaner plans than ROW_NUMBER for small N.
-- Top 3 most recent orders per customer
SELECT c.id, c.name, o.order_id, o.order_date, o.amount
FROM customers c
LEFT JOIN LATERAL (
SELECT order_id, order_date, amount
FROM orders
WHERE customer_id = c.id
ORDER BY order_date DESC
LIMIT 3
) o ON true;
-- Most recent failed login per user
SELECT u.id, u.email, lf.attempted_at, lf.ip_address
FROM users u
LEFT JOIN LATERAL (
SELECT attempted_at, ip_address
FROM login_failures
WHERE user_id = u.id
ORDER BY attempted_at DESC
LIMIT 1
) lf ON true;Advanced pattern for top-N per group. Postgres-only. Often cleaner than ROW_NUMBER for small N. Snowflake's LATERAL FLATTEN serves a similar purpose for nested JSON.
SQL JOINs FAQ
How many types of SQL JOINs are there?+
What's the difference between LEFT JOIN and INNER JOIN?+
Why do SQL JOINs appear in every data engineering interview?+
How do I avoid row duplication when joining tables?+
When should I use a CROSS JOIN vs UNION?+
What's a LATERAL JOIN and when do I need one?+
Write multi-table joins under pressure
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition