Joins: Advanced

GitHub's social graph connects over 100 million developers, and the "suggested repositories" feature works by self-joining a followers table to itself to find developers who share mutual interests with you. When GitHub asks "what repositories are popular among the developers I follow?", it queries a single massive table against a copy of itself, traversing second-degree relationships across hundreds of millions of edges. The same self-join pattern powers org-chart traversal at enterprise platforms like Workday and connection-of-connection discovery at every major professional network. The advanced techniques you are about to learn operate at this scale.

Broadcast joins (distributed)

Daily Life

Interviews

Optimize joins across distributed systems

In distributed query engines, data is split across multiple nodes. When you join two tables, the engine must decide how to bring matching rows together. This decision dramatically affects query performance.

The Distribution Problem

Consider joining a 100 billion row fact table with a 10,000 row dimension table. The fact table is distributed across 100+ nodes. For each fact row to find its dimension match, the dimension data must be accessible.

Shuffle Join

Redistribute both tables so matching keys land on the same node.

Broadcast Join

Copy the smaller table to every node that has fact data.

Broadcast Join Mechanics

A broadcast join copies the entire smaller table to every worker node. Each worker then performs a local join between its partition of the large table and the complete small table.

Aspect	Shuffle Join	Broadcast Join
Network Traffic	High (both tables moved)	Low (only small table copied)
Memory Per Node	Lower	Higher (full small table)
Best When	Both tables large	One table is small
Scalability	Handles any size	Limited by memory

Broadcast Selection

Query engines automatically select broadcast joins when the smaller table fits in memory. The threshold varies by engine:

Engine Decision Factors

Most engines use size thresholds between 10MB and 100MB
Tables below the threshold are automatically broadcast
Some engines use table statistics for smarter decisions

Broadcast Control

Sometimes the query optimizer makes suboptimal choices. Explicit hints let you override the engine's join strategy decisions.

Explicit Broadcast Hints

Sometimes the optimizer picks shuffle when broadcast would be faster. Many engines support hints to force a broadcast join, typically using comment-style syntax like /*+ BROADCAST(table) */ after SELECT.

	SELECT
	f.order_id,
	d.product_name
	FROM fact_orders AS f
	INNER JOIN dim_product AS d
	ON f.product_id = d.product_id

Result

order_id	product_name
ORD001	Laptop Pro
ORD002	Wireless Mouse
ORD003	USB Hub

Hint tells the engine to broadcast dim_product to all nodes

Broadcast Hints

Consider explicit hints when: the optimizer underestimates a small table's size (outdated statistics), you know a filtered result will be small, or you're joining a large fact table with a dimension table that fits in memory.

Broadcasting a filtered result:

	SELECT
	c.user_id,
	p.category,
	COUNT(*) AS views
	FROM clickstream AS c
	INNER JOIN (
	SELECT
	product_id,
	category
	FROM products
	WHERE is_active = TRUE
	) AS p
	ON c.product_id = p.product_id
	GROUP BY c.user_id, p.category

Result

user_id	category	views
U001	Electronics	15
U001	Clothing	8
U002	Electronics	23

Without the hint, the optimizer might not realize the filtered subquery is small enough to broadcast. The hint ensures the engine copies the active products to all nodes rather than shuffling the massive clickstream table.

✓Do

Use broadcast hints for dimension tables under 1GB
Run EXPLAIN to verify the join strategy changed
Update table statistics regularly so hints become unnecessary

✗Don't

Broadcast tables that won't fit in worker memory
Use hints as a substitute for proper table statistics

Major tech companies rely heavily on broadcast joins for dimension lookups to avoid shuffling massive event tables.

> Complete this query to identify which broadcast strategy is best for joining a 10TB event_data table with a 50MB users table.

SELECT
  e.event_type,
  u.username
FROM event_data AS e
INNER ___ users AS u
  ON e.user_id = u.user_id

Broadcast users table

Shuffle both tables

Partition event_data table

Broadcast joins are one of the most impactful performance optimizations available in distributed SQL because they eliminate the shuffle step that moves data across network boundaries.

The key insight is that broadcasting a small table once is far cheaper than redistributing a massive fact table just to place matching rows on the same node.

Understanding when your engine chooses a broadcast vs shuffle join helps you diagnose slow queries by checking the execution plan and verifying the join strategy matches your expectations.

Inequality joins with BETWEEN

Daily Life

Interviews

Join rows on range conditions

Not all joins match on exact values. Range-based matching opens up powerful patterns for lookups, attribution, and temporal analysis.

So far, all our joins used equality conditions: ON a.id = b.id. But join conditions can use any comparison operator: <, >, <=, >=, and BETWEEN.

Range Matching Example

Suppose you have employee metrics with salaries and a cost allocation table with budget brackets. Each bracket has a min and max income. To find each employee's tax rate, match their salary to the bracket range:

employee_metrics

name	salary
Alice	$75000
Bob	$35000
Carol	$95000

cost_allocs

min_income	max_income	rate
$0	$40000	12%
$40000	$85000	22%
$85000	$200000	32%

	SELECT
	e.name,
	e.salary,
	t.rate AS tax_rate
	FROM employee_metrics AS e
	INNER JOIN cost_allocs AS t
	ON e.salary >= t.min_income
	AND e.salary < t.max_income

Result

name	salary	tax_rate
Alice	$75000	22%
Bob	$35000	12%
Carol	$95000	32%

Each employee's salary is matched to the bracket where it falls between min_income and max_income. This is impossible with an equality join because there's no single key to match on.

> Find events that occurred during a batch job by matching timestamps to the job window.

SELECT
  e.event_type,
  e.event_timestamp,
  b.job_name
FROM event_data AS e
INNER JOIN batch_jobs AS b
  ON e.event_timestamp ___ b.started
  ___ e.event_timestamp ___ b.ended

AND

BETWEEN Syntax

The BETWEEN operator provides cleaner syntax for range conditions:

	SELECT
	e.name,
	e.salary,
	t.rate AS tax_rate
	FROM employee_metrics AS e
	INNER JOIN cost_allocs AS t
	ON e.salary BETWEEN t.min_income
	AND t.max_income

Result

name	salary	tax_rate
Alice	75000	22%
Bob	45000	12%
Carol	125000	32%

Real-World Applications

Inequality joins appear frequently in analytics for event attribution, time-window matching, and ad-hoc range lookups.

Event Attribution Example

Another common pattern: attributing events to time windows. Match each conversion to the ad impression that preceded it within a lookback window:

	SELECT
	c.conversion_id,
	c.conversion_time,
	i.impression_id,
	i.impression_time,
	DATE_DIFF(
	'minute',
	i.impression_time,
	c.conversion_time
	) AS minutes_to_convert
	FROM conversions AS c
	INNER JOIN impressions AS i
	ON c.user_id = i.user_id
	AND i.impression_time < c.conversion_time
	AND i.impression_time >= c.conversion_time - INTERVAL '24' HOUR
	ORDER BY c.conversion_id, i.impression_time DESC

Result

conversion_id	conversion_time	impression_id	impression_time	minutes_to_convert
C001	2024-01-15 14:30	I005	2024-01-15 14:15	15
C001	2024-01-15 14:30	I003	2024-01-15 10:00	270
C002	2024-01-15 16:45	I008	2024-01-15 16:30	15

This finds all impressions for the same user that happened within 24 hours before the conversion. The inequality conditions (< and >=) define the attribution window.

Performance

Inequality joins are computationally expensive. Without an equality condition, the engine may need to compare every row in one table against every row in the other (a Cartesian product with filtering). Adding an equality condition alongside the inequality dramatically improves performance.

TIP

Always include an equality condition when possible. In the event attribution example, c.user_id = i.user_id allows the engine to first partition by user, then apply the time range filter within each partition.

Here are four optimization strategies that reduce the cost of inequality joins.

Optimization Strategies

Add an equality condition to partition data before the range comparison
Pre-filter in subqueries to reduce row counts before the join
Bucket continuous time ranges into discrete periods for partitioning
Replace self joins with window functions when possible

Self joins for hierarchical data

Daily Life

Interviews

Query parent-child relationships in one table

A self join is when a table is joined to itself. This sounds strange at first, but it's essential for querying hierarchical data, comparing rows within the same table, and finding related records.

Why Self Join?

Self joins solve problems where relationships exist within a single table. The most common example: an employee_metrics table where each employee has a manager_id that references another row in the same table.

emp_id	name	manager_id
1	Alice	NULL
2	Bob	1
3	Carol	1
4	Dave	2
5	Eve	2
6	Frank	3

Alice is the CEO (no manager). Bob and Carol report to Alice. Dave and Eve report to Bob. Frank reports to Carol. To show each employee with their manager's name, we join the table to itself.

Self Join Syntax

Use table aliases to distinguish the two "copies" of the table. Think of it as creating two separate reference points into the same data:

	SELECT
	e.name AS employee,
	m.name AS manager
	FROM employee_metrics AS e
	LEFT JOIN employee_metrics AS m
	ON e.manager_id = m.emp_id

Result

employee	manager
Alice	NULL
Bob	Alice
Carol	Alice
Dave	Bob
Eve	Bob
Frank	Carol

We use LEFT JOIN because Alice has no manager (NULL), and we still want to include her in the results. With INNER JOIN, Alice would disappear.

Watch how the table is joined to itself: each employee's manager_id is matched against the emp_id column in the same table.

	SELECT
	e.name AS employee,
	m.name AS manager
	FROM employee_metrics AS e
	LEFT JOIN employee_metrics AS m
	ON e.manager_id = m.emp_id

Join Animation[object Object]

employee_metrics e

emp_id	name	mgr_id
1	Alice	NULL
2	Bob	1
3	Carol	1

employee_metrics m

emp_id	name	mgr_id
1	Alice	NULL
2	Bob	1
3	Carol	1

result

col_0	col_1
1	Alice	NULL	1	Alice	NULL

Row 1/3

TIP

Notice that Alice appears with NULL for her manager because she has no manager_id. The LEFT JOIN ensures she isn't dropped from the results, which is critical for including top-level nodes in a hierarchy.

Self Join Patterns

Beyond simple hierarchies, self joins enable powerful row comparison and relationship discovery within a single table.

Finding Colleagues

Self joins can find relationships between rows. For example, finding colleagues who share the same manager:

	SELECT
	e1.name AS employee1,
	e2.name AS employee2,
	m.name AS shared_manager
	FROM employee_metrics AS e1
	INNER JOIN employee_metrics AS e2
	ON e1.manager_id = e2.manager_id
	AND e1.emp_id < e2.emp_id
	INNER JOIN employee_metrics AS m
	ON e1.manager_id = m.emp_id

Result

employee1	employee2	shared_manager
Bob	Carol	Alice
Dave	Eve	Bob

The condition e1.emp_id < e2.emp_id prevents listing each pair twice (Bob-Carol and Carol-Bob) and prevents pairing an employee with themselves.

> Find metrics that share the same department using a self join.

SELECT
  e1.metric_name,
  e2.metric_name
FROM employee_metrics AS e1
INNER JOIN employee_metrics AS e2
  ON ___ = e2.department
  AND e1.metric_id ___ e2.metric_id

e1.department

e1.metric_id

Comparing Rows

Self joins are powerful for finding sequential or comparative patterns. Find all products that were repriced higher:

product	price	date
Widget	$10	2024-01
Widget	$12	2024-02
Gadget	$25	2024-01
Gadget	$23	2024-02

	SELECT
	curr.product,
	prev.price AS old_price,
	curr.price AS new_price,
	curr.price_date AS change_date
	FROM price_history AS curr
	INNER JOIN price_history AS prev
	ON curr.product = prev.product
	AND curr.price_date > prev.price_date
	WHERE curr.price > prev.price

Result

product	old_price	new_price	change_date
Widget	$10	$12	2024-02

Widget shows up because its price increased from $10 to $12. Gadget doesn't appear because its price decreased.

TIP

For finding consecutive changes (like month-over-month), use LAG() or LEAD() window functions instead of self joins. They're more efficient and easier to read for sequential comparisons.

Recursive CTEs for trees

Daily Life

Interviews

Traverse tree structures with SQL

A recursive CTE (Common Table Expression) is a query that references itself to traverse hierarchical or graph-structured data. It's like a loop in programming: start with base cases, then repeatedly expand until done.

Multi-Level Hierarchies

Self joins work for one-level relationships (employee to manager). But what if you need all ancestors: employee to manager to manager's manager to CEO? You'd need multiple self joins, and you'd need to know the maximum depth in advance.

Org hierarchies

Traverse organizational structures of unknown depth.

Bill of materials

Navigate parts containing subparts recursively.

Category trees

Walk subcategories within subcategories to any level.

Graph traversal

Find all connected nodes in network relationships.

Sequence generation

Generate number series or date ranges dynamically.

Recursive CTE Structure

A recursive CTE has two parts connected by UNION ALL:

	WITH cte_name AS (
	SELECT
	id,
	parent_id,
	name
	FROM my_table
	WHERE parent_id IS NULL
	)

	SELECT
	*
	FROM cte_name

Result

id	parent_id	name
1	NULL	Root Node

BASERECURSEUNION ALLITERATERESULT

BASE

Anchor query

Seeds the first result

RECURSE

Self-reference

Joins back to the CTE

UNION ALL

Combine rows

Appends each iteration

ITERATE

Loop to depth

Repeats until no rows

RESULT

Final output

Returns accumulated set

> Complete the recursive CTE to find all messages in a reply thread.

WITH RECURSIVE thread AS (
  SELECT
    msg_id,
    content
  FROM chat_msgs
  WHERE reply_to IS NULL

  ___

  SELECT
    c.msg_id,
    c.content
  FROM chat_msgs AS c
  INNER ___ thread AS t
    ON c.reply_to = t.msg_id
)

SELECT
  *
FROM thread

UNION ALL

JOIN

UNION

Recursive CTE in Practice

Recursive CTEs enable traversal of arbitrary-depth hierarchies, path building, and cycle detection in graph structures.

Hierarchy Example

Let's find all entries under Alice (the CEO) and their level in the hierarchy:

	SELECT
	name,
	1 AS level
	FROM employee_metrics
	WHERE manager_id IS NULL

	UNION ALL

	SELECT
	e.name,
	2 AS level
	FROM employee_metrics AS e
	WHERE e.manager_id IN (
	SELECT
	emp_id
	FROM employee_metrics
	WHERE manager_id IS NULL
	)

Result

name	level
Alice	1
Bob	2
Carol	2
Dave	3
Eve	3
Frank	3

The query starts with Alice (level 1), finds her direct reports Bob and Carol (level 2), then finds their reports Dave, Eve, and Frank (level 3). It stops when no more rows reference the current level.

Building a Path

Recursive CTEs can accumulate information across levels. Here we build a management chain path:

	SELECT
	e.name,
	CASE
	WHEN m.name IS NULL THEN e.name
	ELSE m.name \|\| ' > ' \|\| e.name
	END AS path
	FROM employee_metrics AS e
	LEFT JOIN employee_metrics AS m
	ON e.manager_id = m.emp_id
	ORDER BY path

Result

name	path
Alice	Alice
Bob	Alice > Bob
Dave	Alice > Bob > Dave
Eve	Alice > Bob > Eve
Carol	Alice > Carol
Frank	Alice > Carol > Frank

Each iteration appends the current employee's name to the path. The || operator concatenates strings (use CONCAT() in some databases).

Cycle Detection

Recursive CTEs can run forever if the data contains cycles (A > B > C > A). Most databases have safeguards:

Handling Cycles

Most engines have a default recursion limit (100-1000 iterations)
Use CYCLE detection syntax where supported
Track visited nodes in an array: ARRAY[emp_id]
Add WHERE NOT CONTAINS(visited, id) to break cycles

	WITH traverse AS(SELECT id, parent_id, ARRAY id AS visited FROM nodes WHERE parent_id IS NULL)
	SELECT *
	FROM traverse

Result

id	parent_id	visited
1	NULL	[1]

CROSS JOIN (Cartesian product)

Daily Life

Interviews

Generate every possible row combination

A cross join combines every row from one table with every row from another, creating all possible combinations.

A CROSS JOIN produces the Cartesian product of two tables: every row from the first table paired with every row from the second table. If table A has 3 rows and table B has 4 rows, the result has 3 × 4 = 12 rows.

Syntax Options

SQL supports two equivalent syntaxes for cross joins:

	SELECT
	a.category,
	b.device_type
	FROM products AS a
	CROSS JOIN devices AS b

Result

category	device_type
Electronics	phone
Electronics	tablet
Clothing	phone
Clothing	tablet

CROSS JOIN syntax (preferred)

	SELECT
	a.category,
	b.device_type
	FROM products AS a
	CROSS JOIN devices AS b

Result

category	device_type
Electronics	phone
Electronics	tablet
Clothing	phone
Clothing	tablet

Comma syntax (older SQL-89 style)

The comma syntax is the older SQL-89 style. The explicit CROSS JOIN is clearer and preferred in modern code. Both produce identical results.

Observe how every row from the first table pairs with every row from the second, producing all possible combinations.

	SELECT
	a.category,
	b.device_type
	FROM products AS a
	CROSS JOIN devices AS b

Join Animation[object Object]

products

category
Electronics
Clothing

devices

device_type
phone
tablet

result

col_0	col_1
Electronics	phone

Row 1/2

Two rows crossed with two rows yields four output rows. This multiplicative behavior is why cross joins require caution with larger tables.

Product Variants

Cross joins generate all possible combinations. Perfect for creating product variant grids:

products

category
Electronics
Clothing
Home

devices

device_type
phone
tablet
laptop

	SELECT
	p.category,
	d.device_type,
	p.category \|\| '-' \|\| d.device_type AS sku
	FROM products AS p
	CROSS JOIN devices AS d
	ORDER BY p.category, d.device_type

Result

category	device_type	sku
Clothing	laptop	Clothing-laptop
Clothing	phone	Clothing-phone
Clothing	tablet	Clothing-tablet
Electronics	laptop	Electronics-laptop
Electronics	phone	Electronics-phone
Electronics	tablet	Electronics-tablet
Home	laptop	Home-laptop
Home	phone	Home-phone
Home	tablet	Home-tablet

3 categories × 3 device types = 9 SKU combinations. This pattern is used for product catalogs, seating charts, calendar grids, and more.

> Identify which columns come from which table in this cross join that generates all category-device combinations.

SELECT
  ___,
  ___
FROM products AS p
CROSS JOIN devices AS d

p.category

d.device_type

p.device_type

d.category

Practical Cross Join Uses

Cross joins are essential for creating scaffolds, generating test data, and ensuring complete dimension coverage in reports.

Date × Category Reporting

A common analytics pattern: ensure every date-category combination exists, even if there's no data:

	WITH date_spine AS (
	SELECT
	CAST(date_column AS DATE) AS report_date
	FROM UNNEST(
	SEQUENCE(
	DATE '2024-01-01',
	DATE '2024-01-07'
	)
	) AS t(date_column)
	),
	categories AS (
	SELECT DISTINCT
	category
	FROM products
	)

	SELECT
	d.report_date,
	c.category,
	COALESCE(SUM(s.amount), 0) AS total_sales
	FROM date_spine AS d
	CROSS JOIN categories AS c
	LEFT JOIN sales AS s
	ON d.report_date = s.sale_date
	AND c.category = s.category
	GROUP BY d.report_date, c.category
	ORDER BY d.report_date, c.category

Result

report_date	category	total_sales
2024-01-01	Electronics	1250.00
2024-01-01	Clothing	0
2024-01-02	Electronics	890.00
2024-01-02	Clothing	450.00

The CROSS JOIN creates every date-category pair. The LEFT JOIN fills in actual sales data. Dates with no sales show 0 instead of missing entirely.

The Danger of Cross Joins

Cross joins multiply row counts. A cross join between two 10,000-row tables produces 100,000,000 rows. Always consider the cardinality:

TIP

Cross joining tables with even modest row counts can explode your result set. Always verify both tables are small or use LIMIT during development. A forgotten WHERE clause after a comma join is a common cause of runaway queries.

Despite the risk, cross joins have several legitimate use cases in production data pipelines.

Cross Join Use Cases

Product variants: generate every combination of color and size
Calendar scaffolds: cross dates with groups to fill reporting gaps
Pairwise comparisons: match every row against every other row
Test data: generate synthetic datasets from small seed tables
Dense pivot tables: ensure all key combinations exist in reports

When you do use a cross join, follow these guardrails to keep result sizes under control.

✓Do

Use explicit CROSS JOIN syntax for clarity
Verify row counts before cross joining

✗Don't

Cross join large tables without LIMIT
Use comma syntax in modern SQL code

The consequences of an unguarded cross join can be dramatic.

Mastering these advanced join patterns opens up complex analytical possibilities. Put your skills to the test with hands-on challenges.

The Patient Journey PipelineStep 1

A hospital network with 3 facilities needs to build unified patient records by joining siloed clinical, billing, and scheduling data. You lead the data engineering effort to create the patient journey pipeline. Each facility uses a different EHR vendor. Patient IDs overlap across systems (facility A and facility B both have patient_id 1001, but they refer to different people). Clinicians need a single view of each patient across all three facilities for care coordination.

patient_visits

visit_id	facility	patient_id	admit_date	diagnosis
V-001	A	1001	2024-03-15	pneumonia
V-002	B	1001	2024-03-20	fracture
V-003	A	2045	2024-03-22	diabetes

patient_profiles

patient_id	facility	name	dob	insurance_id
1001	A	Maria Garcia	1985-06-12	INS-447
1001	B	James Wilson	1972-11-03	INS-891
2045	A	Maria Garcia	1985-06-12	INS-447

Jul 2026

Cross-System Join Strategy

You need to join 50M patient visits to 200K patient profiles. Patient IDs are only unique within a facility, not across the network. The same person (Maria Garcia) may have different IDs at different facilities. Which join approach do you use?

SQL

-- Goal: unified patient record
SELECT v.visit_id, p.name, v.diagnosis
FROM patient_visits v
??? patient_profiles p
  ON ??? -- your join logic here

Cross-facility patient matching is one of the hardest join problems in healthcare data engineering because there is no universal patient identifier and name plus date-of-birth matching produces both false positives and false negatives.

The self-join deduplication approach used to build a master patient index generalizes to any entity resolution problem where you need to find equivalent records across siloed systems.

Combining multiple join types in a single pipeline, such as composite equi-join for exact matches followed by an inequality join for fuzzy matching, is a common pattern for handling real-world data quality issues.

❯❯❯PUTTING IT ALL TOGETHER

> You are a data engineer at Workday building a workforce analytics platform that maps reporting chains, computes salary band compliance, and generates all role-skill combinations for a recommendation engine. All queries touch the same employee_metrics table in different ways.

Broadcast join hints push the small salary_bands dimension table to all worker nodes, avoiding a shuffle of the 50M employee_metrics fact table.

An inequality join on employee_metrics.salary BETWEEN bands.min_salary AND bands.max_salary assigns each employee to their correct compensation band.

A self join aliases employee_metrics as manager and employee_metrics as report, linking each row to its manager_id counterpart for org-chart traversal.

CROSS JOIN between roles and required_skills generates every role-skill combination as the scaffold for the recommendation scoring model.

KEY TAKEAWAYS

Broadcast joins copy small tables to all nodes; use for dimension tables under 1GB

Inequality joins match ranges (ON salary BETWEEN min AND max); add equality conditions for performance

Self joins connect a table to itself; essential for hierarchies and row comparisons

Use table aliases to distinguish the two "copies" in a self join

Recursive CTEs traverse unknown-depth hierarchies: base case UNION ALL recursive step

CROSS JOIN produces Cartesian products (all combinations); row count = rows1 x rows2

Use CROSS JOIN for product variants, date scaffolds, and combination generation

Always verify row counts before cross joining; small tables can explode results

Self joins and CROSS joins. Send help.

Category: SQL
Difficulty: advanced
Duration: 32 minutes
Challenges: 0 hands-on challenges

Topics covered: Broadcast joins (distributed), Inequality joins with BETWEEN, Self joins for hierarchical data, Recursive CTEs for trees, CROSS JOIN (Cartesian product)

Lesson Sections

Broadcast joins (distributed) (concepts: sqlJoinCardinality)
In distributed query engines, data is split across multiple nodes. When you join two tables, the engine must decide how to bring matching rows together. This decision dramatically affects query performance. The Distribution Problem Consider joining a 100 billion row fact table with a 10,000 row dimension table. The fact table is distributed across 100+ nodes. For each fact row to find its dimension match, the dimension data must be accessible. Broadcast Join Mechanics A broadcast join copies the
Inequality joins with BETWEEN (concepts: sqlInequalityJoin)
Not all joins match on exact values. Range-based matching opens up powerful patterns for lookups, attribution, and temporal analysis. Range Matching Example Suppose you have employee metrics with salaries and a cost allocation table with budget brackets. Each bracket has a min and max income. To find each employee's tax rate, match their salary to the bracket range: Each employee's salary is matched to the bracket where it falls between min_income and max_income. This is impossible with an equal
Self joins for hierarchical data (concepts: sqlSelfJoin)
A self join is when a table is joined to itself. This sounds strange at first, but it's essential for querying hierarchical data, comparing rows within the same table, and finding related records. Why Self Join? Self joins solve problems where relationships exist within a single table. The most common example: an employee_metrics table where each employee has a manager_id that references another row in the same table. Alice is the CEO (no manager). Bob and Carol report to Alice. Dave and Eve rep
Recursive CTEs for trees (concepts: sqlRecursiveCte)
Multi-Level Hierarchies Self joins work for one-level relationships (employee to manager). But what if you need all ancestors: employee to manager to manager's manager to CEO? You'd need multiple self joins, and you'd need to know the maximum depth in advance. Recursive CTE Structure Recursive CTE in Practice Recursive CTEs enable traversal of arbitrary-depth hierarchies, path building, and cycle detection in graph structures. Hierarchy Example Let's find all entries under Alice (the CEO) and th
CROSS JOIN (Cartesian product) (concepts: sqlCrossJoin)
A cross join combines every row from one table with every row from another, creating all possible combinations. Syntax Options SQL supports two equivalent syntaxes for cross joins: CROSS JOIN syntax (preferred) Comma syntax (older SQL-89 style) Observe how every row from the first table pairs with every row from the second, producing all possible combinations. Two rows crossed with two rows yields four output rows. This multiplicative behavior is why cross joins require caution with larger table