Joins: Advanced

GitHub's social graph connects over 100 million developers, and the "suggested repositories" feature works by self-joining a followers table to itself to find developers who share mutual interests with you. When GitHub asks "what repositories are popular among the developers I follow?", it queries a single massive table against a copy of itself, traversing second-degree relationships across hundreds of millions of edges. The same self-join pattern powers org-chart traversal at enterprise platforms like Workday and connection-of-connection discovery at every major professional network. The advanced techniques you are about to learn operate at this scale.

Broadcast joins (distributed)

Daily Life
Interviews

Optimize joins across distributed systems

In distributed query engines, data is split across multiple nodes. When you join two tables, the engine must decide how to bring matching rows together. This decision dramatically affects query performance.
//

The Distribution Problem

Consider joining a 100 billion row fact table with a 10,000 row dimension table. The fact table is distributed across 100+ nodes. For each fact row to find its dimension match, the dimension data must be accessible.
Shuffle Join
Shuffle Join
Redistribute both tables so matching keys land on the same node.
Broadcast Join
Broadcast Join
Copy the smaller table to every node that has fact data.
//

Broadcast Join Mechanics

A broadcast join copies the entire smaller table to every worker node. Each worker then performs a local join between its partition of the large table and the complete small table.
AspectShuffle JoinBroadcast Join
Network TrafficHigh (both tables moved)Low (only small table copied)
Memory Per NodeLowerHigher (full small table)
Best WhenBoth tables largeOne table is small
ScalabilityHandles any sizeLimited by memory
//

Broadcast Selection

Query engines automatically select broadcast joins when the smaller table fits in memory. The threshold varies by engine:
Engine Decision Factors
  • Most engines use size thresholds between 10MB and 100MB
  • Tables below the threshold are automatically broadcast
  • Some engines use table statistics for smarter decisions

Broadcast Control

Sometimes the query optimizer makes suboptimal choices. Explicit hints let you override the engine's join strategy decisions.
//

Explicit Broadcast Hints

Sometimes the optimizer picks shuffle when broadcast would be faster. Many engines support hints to force a broadcast join, typically using comment-style syntax like /*+ BROADCAST(table) */ after SELECT.

1SELECT
2 f.order_id,
3 d.product_name
4FROM fact_orders AS f
5INNER JOIN dim_product AS d
6 ON f.product_id = d.product_id
Result
order_idproduct_name
ORD001Laptop Pro
ORD002Wireless Mouse
ORD003USB Hub

Hint tells the engine to broadcast dim_product to all nodes

//

Broadcast Hints

Consider explicit hints when: the optimizer underestimates a small table's size (outdated statistics), you know a filtered result will be small, or you're joining a large fact table with a dimension table that fits in memory.

Broadcasting a filtered result:

1SELECT
2 c.user_id,
3 p.category,
4 COUNT(*) AS views
5FROM clickstream AS c
6INNER JOIN (
7 SELECT
8 product_id,
9 category
10 FROM products
11 WHERE is_active = TRUE
12) AS p
13 ON c.product_id = p.product_id
14GROUP BY c.user_id, p.category
Result
user_idcategoryviews
U001Electronics15
U001Clothing8
U002Electronics23
Without the hint, the optimizer might not realize the filtered subquery is small enough to broadcast. The hint ensures the engine copies the active products to all nodes rather than shuffling the massive clickstream table.
Do
  • Use broadcast hints for dimension tables under 1GB
  • Run EXPLAIN to verify the join strategy changed
  • Update table statistics regularly so hints become unnecessary
Don't
  • Broadcast tables that won't fit in worker memory
  • Use hints as a substitute for proper table statistics
Major tech companies rely heavily on broadcast joins for dimension lookups to avoid shuffling massive event tables.

> Complete this query to identify which broadcast strategy is best for joining a 10TB event_data table with a 50MB users table.

SELECT
  e.event_type,
  u.username
FROM event_data AS e
INNER ___ users AS u
  ON e.user_id = u.user_id
Broadcast users table
Shuffle both tables
Partition event_data table
Broadcast joins are one of the most impactful performance optimizations available in distributed SQL because they eliminate the shuffle step that moves data across network boundaries.
The key insight is that broadcasting a small table once is far cheaper than redistributing a massive fact table just to place matching rows on the same node.
Understanding when your engine chooses a broadcast vs shuffle join helps you diagnose slow queries by checking the execution plan and verifying the join strategy matches your expectations.

Inequality joins with BETWEEN

Daily Life
Interviews

Join rows on range conditions

Not all joins match on exact values. Range-based matching opens up powerful patterns for lookups, attribution, and temporal analysis.

So far, all our joins used equality conditions: ON a.id = b.id. But join conditions can use any comparison operator: <, >, <=, >=, and BETWEEN.

//

Range Matching Example

Suppose you have employee metrics with salaries and a cost allocation table with budget brackets. Each bracket has a min and max income. To find each employee's tax rate, match their salary to the bracket range:
employee_metrics
namesalary
Alice$75000
Bob$35000
Carol$95000
cost_allocs
min_incomemax_incomerate
$0$4000012%
$40000$8500022%
$85000$20000032%
1SELECT
2 e.name,
3 e.salary,
4 t.rate AS tax_rate
5FROM employee_metrics AS e
6INNER JOIN cost_allocs AS t
7 ON e.salary >= t.min_income
8 AND e.salary < t.max_income
Result
namesalarytax_rate
Alice$7500022%
Bob$3500012%
Carol$9500032%
Each employee's salary is matched to the bracket where it falls between min_income and max_income. This is impossible with an equality join because there's no single key to match on.

> Find events that occurred during a batch job by matching timestamps to the job window.

SELECT
  e.event_type,
  e.event_timestamp,
  b.job_name
FROM event_data AS e
INNER JOIN batch_jobs AS b
  ON e.event_timestamp ___ b.started
  ___ e.event_timestamp ___ b.ended
AND
>=
<
=
//

BETWEEN Syntax

The BETWEEN operator provides cleaner syntax for range conditions:

1SELECT
2 e.name,
3 e.salary,
4 t.rate AS tax_rate
5FROM employee_metrics AS e
6INNER JOIN cost_allocs AS t
7 ON e.salary BETWEEN t.min_income
8 AND t.max_income
Result
namesalarytax_rate
Alice7500022%
Bob4500012%
Carol12500032%

Real-World Applications

Inequality joins appear frequently in analytics for event attribution, time-window matching, and ad-hoc range lookups.
//

Event Attribution Example

Another common pattern: attributing events to time windows. Match each conversion to the ad impression that preceded it within a lookback window:
1SELECT
2 c.conversion_id,
3 c.conversion_time,
4 i.impression_id,
5 i.impression_time,
6 DATE_DIFF(
7 'minute',
8 i.impression_time,
9 c.conversion_time
10 ) AS minutes_to_convert
11FROM conversions AS c
12INNER JOIN impressions AS i
13 ON c.user_id = i.user_id
14 AND i.impression_time < c.conversion_time
15 AND i.impression_time >= c.conversion_time - INTERVAL '24' HOUR
16ORDER BY c.conversion_id, i.impression_time DESC
Result
conversion_idconversion_timeimpression_idimpression_timeminutes_to_convert
C0012024-01-15 14:30I0052024-01-15 14:1515
C0012024-01-15 14:30I0032024-01-15 10:00270
C0022024-01-15 16:45I0082024-01-15 16:3015
This finds all impressions for the same user that happened within 24 hours before the conversion. The inequality conditions (< and >=) define the attribution window.
//

Performance

Inequality joins are computationally expensive. Without an equality condition, the engine may need to compare every row in one table against every row in the other (a Cartesian product with filtering). Adding an equality condition alongside the inequality dramatically improves performance.

TIP
Always include an equality condition when possible. In the event attribution example, c.user_id = i.user_id allows the engine to first partition by user, then apply the time range filter within each partition.
Here are four optimization strategies that reduce the cost of inequality joins.
Optimization Strategies
  • Add an equality condition to partition data before the range comparison
  • Pre-filter in subqueries to reduce row counts before the join
  • Bucket continuous time ranges into discrete periods for partitioning
  • Replace self joins with window functions when possible

Self joins for hierarchical data

Daily Life
Interviews

Query parent-child relationships in one table

A self join is when a table is joined to itself. This sounds strange at first, but it's essential for querying hierarchical data, comparing rows within the same table, and finding related records.
//

Why Self Join?

Self joins solve problems where relationships exist within a single table. The most common example: an employee_metrics table where each employee has a manager_id that references another row in the same table.
emp_idnamemanager_id
1AliceNULL
2Bob1
3Carol1
4Dave2
5Eve2
6Frank3
Alice is the CEO (no manager). Bob and Carol report to Alice. Dave and Eve report to Bob. Frank reports to Carol. To show each employee with their manager's name, we join the table to itself.
//

Self Join Syntax

Use table aliases to distinguish the two "copies" of the table. Think of it as creating two separate reference points into the same data:
1SELECT
2 e.name AS employee,
3 m.name AS manager
4FROM employee_metrics AS e
5LEFT JOIN employee_metrics AS m
6 ON e.manager_id = m.emp_id
Result
employeemanager
AliceNULL
BobAlice
CarolAlice
DaveBob
EveBob
FrankCarol

We use LEFT JOIN because Alice has no manager (NULL), and we still want to include her in the results. With INNER JOIN, Alice would disappear.

Watch how the table is joined to itself: each employee's manager_id is matched against the emp_id column in the same table.
1SELECT
2 e.name AS employee,
3 m.name AS manager
4FROM employee_metrics AS e
5LEFT JOIN employee_metrics AS m
6 ON e.manager_id = m.emp_id
Join Animation[object Object]
employee_metrics e
emp_idnamemgr_id
1AliceNULL
2Bob1
3Carol1
employee_metrics m
emp_idnamemgr_id
1AliceNULL
2Bob1
3Carol1
result
col_0col_1
1AliceNULL1AliceNULL
Row 1/3
TIP
Notice that Alice appears with NULL for her manager because she has no manager_id. The LEFT JOIN ensures she isn't dropped from the results, which is critical for including top-level nodes in a hierarchy.

Self Join Patterns

Beyond simple hierarchies, self joins enable powerful row comparison and relationship discovery within a single table.
//

Finding Colleagues

Self joins can find relationships between rows. For example, finding colleagues who share the same manager:
1SELECT
2 e1.name AS employee1,
3 e2.name AS employee2,
4 m.name AS shared_manager
5FROM employee_metrics AS e1
6INNER JOIN employee_metrics AS e2
7 ON e1.manager_id = e2.manager_id
8 AND e1.emp_id < e2.emp_id
9INNER JOIN employee_metrics AS m
10 ON e1.manager_id = m.emp_id
Result
employee1employee2shared_manager
BobCarolAlice
DaveEveBob
The condition e1.emp_id < e2.emp_id prevents listing each pair twice (Bob-Carol and Carol-Bob) and prevents pairing an employee with themselves.

> Find metrics that share the same department using a self join.

SELECT
  e1.metric_name,
  e2.metric_name
FROM employee_metrics AS e1
INNER JOIN employee_metrics AS e2
  ON ___ = e2.department
  AND e1.metric_id ___ e2.metric_id
<
=
e1.metric_id
e1.department
//

Comparing Rows

Self joins are powerful for finding sequential or comparative patterns. Find all products that were repriced higher:
productpricedate
Widget$102024-01
Widget$122024-02
Gadget$252024-01
Gadget$232024-02
1SELECT
2 curr.product,
3 prev.price AS old_price,
4 curr.price AS new_price,
5 curr.price_date AS change_date
6FROM price_history AS curr
7INNER JOIN price_history AS prev
8 ON curr.product = prev.product
9 AND curr.price_date > prev.price_date
10WHERE curr.price > prev.price
Result
productold_pricenew_pricechange_date
Widget$10$122024-02
Widget shows up because its price increased from $10 to $12. Gadget doesn't appear because its price decreased.
TIP
For finding consecutive changes (like month-over-month), use LAG() or LEAD() window functions instead of self joins. They're more efficient and easier to read for sequential comparisons.

Recursive CTEs for trees

Daily Life
Interviews

Traverse tree structures with SQL

A recursive CTE (Common Table Expression) is a query that references itself to traverse hierarchical or graph-structured data. It's like a loop in programming: start with base cases, then repeatedly expand until done.

//

Multi-Level Hierarchies

Self joins work for one-level relationships (employee to manager). But what if you need all ancestors: employee to manager to manager's manager to CEO? You'd need multiple self joins, and you'd need to know the maximum depth in advance.
Org hierarchies
Org hierarchies
Traverse organizational structures of unknown depth.
Bill of materials
Bill of materials
Navigate parts containing subparts recursively.
Category trees
Category trees
Walk subcategories within subcategories to any level.
Graph traversal
Graph traversal
Find all connected nodes in network relationships.
Sequence generation
Sequence generation
Generate number series or date ranges dynamically.
//

Recursive CTE Structure

A recursive CTE has two parts connected by UNION ALL:

1WITH cte_name AS (
2 SELECT
3 id,
4 parent_id,
5 name
6 FROM my_table
7 WHERE parent_id IS NULL
8)
9
10SELECT
11 *
12FROM cte_name
Result
idparent_idname
1NULLRoot Node
BASERECURSEUNION ALLITERATERESULT
BASE
Anchor query
Seeds the first result
RECURSE
Self-reference
Joins back to the CTE
UNION ALL
Combine rows
Appends each iteration
ITERATE
Loop to depth
Repeats until no rows
RESULT
Final output
Returns accumulated set

> Complete the recursive CTE to find all messages in a reply thread.

WITH RECURSIVE thread AS (
  SELECT
    msg_id,
    content
  FROM chat_msgs
  WHERE reply_to IS NULL

  ___

  SELECT
    c.msg_id,
    c.content
  FROM chat_msgs AS c
  INNER ___ thread AS t
    ON c.reply_to = t.msg_id
)

SELECT
  *
FROM thread
JOIN
UNION ALL
UNION

Recursive CTE in Practice

Recursive CTEs enable traversal of arbitrary-depth hierarchies, path building, and cycle detection in graph structures.
//

Hierarchy Example

Let's find all entries under Alice (the CEO) and their level in the hierarchy:
1SELECT
2 name,
3 1 AS level
4FROM employee_metrics
5WHERE manager_id IS NULL
6
7UNION ALL
8
9SELECT
10 e.name,
11 2 AS level
12FROM employee_metrics AS e
13WHERE e.manager_id IN (
14 SELECT
15 emp_id
16 FROM employee_metrics
17 WHERE manager_id IS NULL
18)
Result
namelevel
Alice1
Bob2
Carol2
Dave3
Eve3
Frank3
The query starts with Alice (level 1), finds her direct reports Bob and Carol (level 2), then finds their reports Dave, Eve, and Frank (level 3). It stops when no more rows reference the current level.
//

Building a Path

Recursive CTEs can accumulate information across levels. Here we build a management chain path:
1SELECT
2 e.name,
3 CASE
4 WHEN m.name IS NULL THEN e.name
5 ELSE m.name || ' > ' || e.name
6 END AS path
7FROM employee_metrics AS e
8LEFT JOIN employee_metrics AS m
9 ON e.manager_id = m.emp_id
10ORDER BY path
Result
namepath
AliceAlice
BobAlice > Bob
DaveAlice > Bob > Dave
EveAlice > Bob > Eve
CarolAlice > Carol
FrankAlice > Carol > Frank

Each iteration appends the current employee's name to the path. The || operator concatenates strings (use CONCAT() in some databases).

//

Cycle Detection

Recursive CTEs can run forever if the data contains cycles (A > B > C > A). Most databases have safeguards:
Handling Cycles
  • Most engines have a default recursion limit (100-1000 iterations)
  • Use CYCLE detection syntax where supported
  • Track visited nodes in an array: ARRAY[emp_id]
  • Add WHERE NOT CONTAINS(visited, id) to break cycles
1WITH traverse AS(SELECT id, parent_id, ARRAY id AS visited FROM nodes WHERE parent_id IS NULL)
2SELECT *
3FROM traverse
Result
idparent_idvisited
1NULL[1]

CROSS JOIN (Cartesian product)

Daily Life
Interviews

Generate every possible row combination

A cross join combines every row from one table with every row from another, creating all possible combinations.

A CROSS JOIN produces the Cartesian product of two tables: every row from the first table paired with every row from the second table. If table A has 3 rows and table B has 4 rows, the result has 3 × 4 = 12 rows.

//

Syntax Options

SQL supports two equivalent syntaxes for cross joins:
1SELECT
2 a.category,
3 b.device_type
4FROM products AS a
5CROSS JOIN devices AS b
Result
categorydevice_type
Electronicsphone
Electronicstablet
Clothingphone
Clothingtablet

CROSS JOIN syntax (preferred)

1SELECT
2 a.category,
3 b.device_type
4FROM products AS a
5CROSS JOIN devices AS b
Result
categorydevice_type
Electronicsphone
Electronicstablet
Clothingphone
Clothingtablet

Comma syntax (older SQL-89 style)

The comma syntax is the older SQL-89 style. The explicit CROSS JOIN is clearer and preferred in modern code. Both produce identical results.

Observe how every row from the first table pairs with every row from the second, producing all possible combinations.
1SELECT
2 a.category,
3 b.device_type
4FROM products AS a
5CROSS JOIN devices AS b
Join Animation[object Object]
products
category
Electronics
Clothing
devices
device_type
phone
tablet
result
col_0col_1
Electronicsphone
Row 1/2
Two rows crossed with two rows yields four output rows. This multiplicative behavior is why cross joins require caution with larger tables.
//

Product Variants

Cross joins generate all possible combinations. Perfect for creating product variant grids:
products
category
Electronics
Clothing
Home
devices
device_type
phone
tablet
laptop
1SELECT
2 p.category,
3 d.device_type,
4 p.category || '-' || d.device_type AS sku
5FROM products AS p
6CROSS JOIN devices AS d
7ORDER BY p.category, d.device_type
Result
categorydevice_typesku
ClothinglaptopClothing-laptop
ClothingphoneClothing-phone
ClothingtabletClothing-tablet
ElectronicslaptopElectronics-laptop
ElectronicsphoneElectronics-phone
ElectronicstabletElectronics-tablet
HomelaptopHome-laptop
HomephoneHome-phone
HometabletHome-tablet
3 categories × 3 device types = 9 SKU combinations. This pattern is used for product catalogs, seating charts, calendar grids, and more.

> Identify which columns come from which table in this cross join that generates all category-device combinations.

SELECT
  ___,
  ___
FROM products AS p
CROSS JOIN devices AS d
d.device_type
p.device_type
d.category
p.category

Practical Cross Join Uses

Cross joins are essential for creating scaffolds, generating test data, and ensuring complete dimension coverage in reports.
//

Date × Category Reporting

A common analytics pattern: ensure every date-category combination exists, even if there's no data:
1WITH date_spine AS (
2 SELECT
3 CAST(date_column AS DATE) AS report_date
4 FROM UNNEST(
5 SEQUENCE(
6 DATE '2024-01-01',
7 DATE '2024-01-07'
8 )
9 ) AS t(date_column)
10),
11categories AS (
12 SELECT DISTINCT
13 category
14 FROM products
15)
16
17SELECT
18 d.report_date,
19 c.category,
20 COALESCE(SUM(s.amount), 0) AS total_sales
21FROM date_spine AS d
22CROSS JOIN categories AS c
23LEFT JOIN sales AS s
24 ON d.report_date = s.sale_date
25 AND c.category = s.category
26GROUP BY d.report_date, c.category
27ORDER BY d.report_date, c.category
Result
report_datecategorytotal_sales
2024-01-01Electronics1250.00
2024-01-01Clothing0
2024-01-02Electronics890.00
2024-01-02Clothing450.00

The CROSS JOIN creates every date-category pair. The LEFT JOIN fills in actual sales data. Dates with no sales show 0 instead of missing entirely.

//

The Danger of Cross Joins

Cross joins multiply row counts. A cross join between two 10,000-row tables produces 100,000,000 rows. Always consider the cardinality:
TIP
Cross joining tables with even modest row counts can explode your result set. Always verify both tables are small or use LIMIT during development. A forgotten WHERE clause after a comma join is a common cause of runaway queries.
Despite the risk, cross joins have several legitimate use cases in production data pipelines.
Cross Join Use Cases
  • Product variants: generate every combination of color and size
  • Calendar scaffolds: cross dates with groups to fill reporting gaps
  • Pairwise comparisons: match every row against every other row
  • Test data: generate synthetic datasets from small seed tables
  • Dense pivot tables: ensure all key combinations exist in reports
When you do use a cross join, follow these guardrails to keep result sizes under control.
Do
  • Use explicit CROSS JOIN syntax for clarity
  • Verify row counts before cross joining
Don't
  • Cross join large tables without LIMIT
  • Use comma syntax in modern SQL code
The consequences of an unguarded cross join can be dramatic.
Mastering these advanced join patterns opens up complex analytical possibilities. Put your skills to the test with hands-on challenges.
The Patient Journey PipelineStep 1
>

A hospital network with 3 facilities needs to build unified patient records by joining siloed clinical, billing, and scheduling data. You lead the data engineering effort to create the patient journey pipeline. Each facility uses a different EHR vendor. Patient IDs overlap across systems (facility A and facility B both have patient_id 1001, but they refer to different people). Clinicians need a single view of each patient across all three facilities for care coordination.

patient_visits
visit_idfacilitypatient_idadmit_datediagnosis
V-001A10012024-03-15pneumonia
V-002B10012024-03-20fracture
V-003A20452024-03-22diabetes
patient_profiles
patient_idfacilitynamedobinsurance_id
1001AMaria Garcia1985-06-12INS-447
1001BJames Wilson1972-11-03INS-891
2045AMaria Garcia1985-06-12INS-447
May 2026
Cross-System Join Strategy

You need to join 50M patient visits to 200K patient profiles. Patient IDs are only unique within a facility, not across the network. The same person (Maria Garcia) may have different IDs at different facilities. Which join approach do you use?

SQL
-- Goal: unified patient record
SELECT v.visit_id, p.name, v.diagnosis
FROM patient_visits v
??? patient_profiles p
  ON ??? -- your join logic here
Cross-facility patient matching is one of the hardest join problems in healthcare data engineering because there is no universal patient identifier and name plus date-of-birth matching produces both false positives and false negatives.
The self-join deduplication approach used to build a master patient index generalizes to any entity resolution problem where you need to find equivalent records across siloed systems.
Combining multiple join types in a single pipeline, such as composite equi-join for exact matches followed by an inequality join for fuzzy matching, is a common pattern for handling real-world data quality issues.
PUTTING IT ALL TOGETHER

> You are a data engineer at Workday building a workforce analytics platform that maps reporting chains, computes salary band compliance, and generates all role-skill combinations for a recommendation engine. All queries touch the same employee_metrics table in different ways.

Broadcast join hints push the small salary_bands dimension table to all worker nodes, avoiding a shuffle of the 50M employee_metrics fact table.
An inequality join on employee_metrics.salary BETWEEN bands.min_salary AND bands.max_salary assigns each employee to their correct compensation band.
A self join aliases employee_metrics as manager and employee_metrics as report, linking each row to its manager_id counterpart for org-chart traversal.
CROSS JOIN between roles and required_skills generates every role-skill combination as the scaffold for the recommendation scoring model.
KEY TAKEAWAYS
Broadcast joins copy small tables to all nodes; use for dimension tables under 1GB
Inequality joins match ranges (ON salary BETWEEN min AND max); add equality conditions for performance
Self joins connect a table to itself; essential for hierarchies and row comparisons
Use table aliases to distinguish the two "copies" in a self join
Recursive CTEs traverse unknown-depth hierarchies: base case UNION ALL recursive step
CROSS JOIN produces Cartesian products (all combinations); row count = rows1 x rows2
Use CROSS JOIN for product variants, date scaffolds, and combination generation
Always verify row counts before cross joining; small tables can explode results

Self joins and CROSS joins. Send help.

Category
SQL
Difficulty
advanced
Duration
32 minutes
Challenges
0 hands-on challenges

Topics covered: Broadcast joins (distributed), Inequality joins with BETWEEN, Self joins for hierarchical data, Recursive CTEs for trees, CROSS JOIN (Cartesian product)

Lesson Sections

  1. Broadcast joins (distributed)

    In distributed query engines, data is split across multiple nodes. When you join two tables, the engine must decide how to bring matching rows together. This decision dramatically affects query performance. The Distribution Problem Consider joining a 100 billion row fact table with a 10,000 row dimension table. The fact table is distributed across 100+ nodes. For each fact row to find its dimension match, the dimension data must be accessible. Broadcast Join Mechanics A broadcast join copies the

  2. Inequality joins with BETWEEN (concepts: sqlInequalityJoin)

    Not all joins match on exact values. Range-based matching opens up powerful patterns for lookups, attribution, and temporal analysis. Range Matching Example Suppose you have employee metrics with salaries and a cost allocation table with budget brackets. Each bracket has a min and max income. To find each employee's tax rate, match their salary to the bracket range: Each employee's salary is matched to the bracket where it falls between min_income and max_income. This is impossible with an equal

  3. Self joins for hierarchical data

    A self join is when a table is joined to itself. This sounds strange at first, but it's essential for querying hierarchical data, comparing rows within the same table, and finding related records. Why Self Join? Self joins solve problems where relationships exist within a single table. The most common example: an employee_metrics table where each employee has a manager_id that references another row in the same table. Alice is the CEO (no manager). Bob and Carol report to Alice. Dave and Eve rep

  4. Recursive CTEs for trees (concepts: sqlRecursiveCte)

    Multi-Level Hierarchies Self joins work for one-level relationships (employee to manager). But what if you need all ancestors: employee to manager to manager's manager to CEO? You'd need multiple self joins, and you'd need to know the maximum depth in advance. Recursive CTE Structure Recursive CTE in Practice Recursive CTEs enable traversal of arbitrary-depth hierarchies, path building, and cycle detection in graph structures. Hierarchy Example Let's find all entries under Alice (the CEO) and th

  5. CROSS JOIN (Cartesian product) (concepts: sqlCrossJoin)

    A cross join combines every row from one table with every row from another, creating all possible combinations. Syntax Options SQL supports two equivalent syntaxes for cross joins: CROSS JOIN syntax (preferred) Comma syntax (older SQL-89 style) Observe how every row from the first table pairs with every row from the second, producing all possible combinations. Two rows crossed with two rows yields four output rows. This multiplicative behavior is why cross joins require caution with larger table