SQL Reference

SQL Self Join

Self joins show up in 6% of verified DE interview SQL rounds, and the shape is predictable. Nine times out of ten the table is called employees, the two aliases are e and m, and the question is walking an org chart from IC to reporting manager. It's the archetypal adjacency-list-on-a-single-table problem, and it belongs in the same architectural bucket as recursive CTEs: tools for relating rows within the same relation.

Where it fits in the pipeline: self joins are the cheap one-level version of a hierarchy walk. One JOIN gets you the parent. Two gets you the grandparent. Past two, you've probably outgrown the pattern and you should be reaching for a recursive CTE. Architecturally, think of self join as the static stage and recursive CTE as the iterative stage.

6%

Rounds use self joins

9/10

Employee/manager shape

1

Level per join

2+

Use CTE instead

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

Self Join Syntax

A self join uses the same table twice with different aliases. The aliases are required because without them the database cannot tell which “copy” of the table you mean in the ON clause and SELECT list.

-- Basic self join: employees and their managers
SELECT
  e.employee_id,
  e.name AS employee_name,
  e.manager_id,
  m.name AS manager_name
FROM employees e
INNER JOIN employees m ON e.manager_id = m.employee_id;

In this query, the employees table appears twice: once as e (the employee) and once as m (the manager). The ON clause says: find the row in m whose employee_id matches the current employee's manager_id.

Self Join with LEFT JOIN

Architectural note: INNER JOIN drops the root node of the hierarchy, which is almost never what a reporting query wants. LEFT JOIN keeps every employee and projects NULL for the top of the chain, which matches the shape a downstream dashboard or a dimension table expects. Pick the join type based on whether the root is data or noise.

-- Keep all employees, show NULL for top-level managers
SELECT
  e.employee_id,
  e.name AS employee_name,
  m.name AS manager_name  -- NULL for CEO / top-level
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;

Interview note: Always clarify whether the question requires all employees or only those with managers. This determines whether you use INNER JOIN or LEFT JOIN. Asking this question shows the interviewer you think about data completeness.

Use Case 1: Hierarchies

The classic self join use case. An employee table has an employee_id and a manager_id. The manager_id points to another employee_id in the same table. To get an employee alongside their manager's details, you self join.

-- Multi-level hierarchy: employee -> manager -> director
SELECT
  e.name AS employee,
  m.name AS manager,
  d.name AS director
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id
LEFT JOIN employees d ON m.manager_id = d.employee_id;

Each additional level adds another self join. For deep hierarchies (5+ levels), this becomes impractical. At that point, a recursive CTE is the better approach.

-- Recursive CTE for arbitrary-depth hierarchy
WITH RECURSIVE org_chart AS (
  -- Base case: top-level (no manager)
  SELECT employee_id, name, manager_id, 0 AS depth
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive case: join children to parents
  SELECT e.employee_id, e.name, e.manager_id, oc.depth + 1
  FROM employees e
  INNER JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT * FROM org_chart ORDER BY depth, name;

Interview tip: Start with a self join for 1-2 levels. If the interviewer asks “what if the hierarchy is N levels deep,” switch to a recursive CTE. Showing both approaches demonstrates range.

Use Case 2: Consecutive Event Comparison

Self joins can compare each row to the next or previous row. This is useful for detecting changes, calculating differences, and finding gaps in sequences. Window functions (LAG/LEAD) are often cleaner for this, but self joins work in databases that do not support window functions and are still tested in interviews.

-- Compare each day's revenue to the previous day
SELECT
  curr.date,
  curr.revenue AS today_revenue,
  prev.revenue AS yesterday_revenue,
  curr.revenue - prev.revenue AS daily_change,
  CASE
    WHEN prev.revenue = 0 THEN NULL
    ELSE ROUND(100.0 * (curr.revenue - prev.revenue) / prev.revenue, 1)
  END AS pct_change
FROM daily_revenue curr
INNER JOIN daily_revenue prev
  ON curr.date = prev.date + INTERVAL '1 day'
ORDER BY curr.date;
-- Find consecutive logins (user logged in two days in a row)
SELECT DISTINCT a.user_id, a.login_date
FROM user_logins a
INNER JOIN user_logins b
  ON a.user_id = b.user_id
  AND a.login_date = b.login_date + INTERVAL '1 day';

Self Join vs LAG/LEAD

The same consecutive-row comparison using a window function:

-- Window function approach (cleaner for ordered comparisons)
SELECT
  date,
  revenue AS today_revenue,
  LAG(revenue) OVER (ORDER BY date) AS yesterday_revenue,
  revenue - LAG(revenue) OVER (ORDER BY date) AS daily_change
FROM daily_revenue
ORDER BY date;

The window function is shorter and avoids the join entirely. But the self join is more flexible: it can compare non-adjacent rows, handle irregular intervals, and join on multiple conditions that do not fit the strict row-ordering that LAG/LEAD require.

Use Case 3: Finding Pairs

Self joins find rows that share a common attribute. This pattern answers questions like “which employees share the same manager” or “which products are in the same category.”

-- Employees who share the same manager
SELECT
  e1.name AS employee_1,
  e2.name AS employee_2,
  e1.manager_id
FROM employees e1
INNER JOIN employees e2
  ON e1.manager_id = e2.manager_id
  AND e1.employee_id < e2.employee_id  -- unique pairs only
ORDER BY e1.manager_id, e1.name;

The e1.employee_id < e2.employee_id condition is critical. Without it, you get duplicate pairs (Alice-Bob and Bob-Alice) plus self-pairs (Alice-Alice). The less-than condition ensures each pair appears exactly once.

-- Products in the same category with similar prices (within 10%)
SELECT
  p1.name AS product_1,
  p2.name AS product_2,
  p1.category,
  p1.price AS price_1,
  p2.price AS price_2
FROM products p1
INNER JOIN products p2
  ON p1.category = p2.category
  AND p1.product_id < p2.product_id
  AND ABS(p1.price - p2.price) <= p1.price * 0.10
ORDER BY p1.category, p1.name;
-- Customers who purchased the same product
SELECT
  o1.customer_id AS customer_1,
  o2.customer_id AS customer_2,
  o1.product_id,
  p.name AS product_name
FROM orders o1
INNER JOIN orders o2
  ON o1.product_id = o2.product_id
  AND o1.customer_id < o2.customer_id
INNER JOIN products p ON o1.product_id = p.product_id
ORDER BY o1.product_id;

3 Self Join Interview Questions

Self join questions are popular in on-site interviews because they test whether you can model relationships within a single table.

Q1: Given an employees table with (id, name, manager_id), write a query to show each employee's name alongside their manager's name.

What they test:

Basic self join syntax and understanding of LEFT vs INNER JOIN for handling top-level employees without managers.

Approach:

LEFT JOIN employees e to employees m on e.manager_id = m.id. Select e.name as employee, m.name as manager. LEFT JOIN keeps employees with NULL manager_id (the CEO).

Q2: Find all pairs of users who logged in on the same day. Each pair should appear once.

What they test:

Pair-finding pattern with duplicate elimination. Can you self join on a shared attribute and use the less-than trick to prevent duplicate pairs?

Approach:

Self join user_logins a to user_logins b on a.login_date = b.login_date AND a.user_id < b.user_id. This gives every unique pair of users who logged in on the same day, without duplicates or self-pairs.

Q3: Given a table of daily stock prices (date, ticker, price), find days where the price increased compared to the previous day for each ticker.

What they test:

Consecutive-row comparison. Can you join a row to the previous row using a date offset? They may also accept a LAG solution.

Approach:

Self join: curr.ticker = prev.ticker AND curr.date = prev.date + INTERVAL '1 day'. Filter WHERE curr.price > prev.price. Mention that LAG(price) OVER (PARTITION BY ticker ORDER BY date) is an alternative approach.

Common Self Join Mistakes

These errors come up repeatedly in interviews and code reviews.

Forgetting to Exclude Self-Pairs

Without a condition to prevent it, a self join pairs every row with itself. An employee becomes their own colleague. A product matches itself as a similar product.

-- BUG: Alice pairs with Alice
SELECT e1.name, e2.name
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.manager_id;

-- FIX: exclude self-pairs
SELECT e1.name, e2.name
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.manager_id
WHERE e1.employee_id != e2.employee_id;

-- BETTER: unique pairs only
WHERE e1.employee_id < e2.employee_id;

Using != Instead of < for Unique Pairs

Using != eliminates self-pairs but keeps both orderings: Alice-Bob and Bob-Alice. Using < keeps only one ordering.

-- != gives duplicate pairs: (Alice,Bob) AND (Bob,Alice)
WHERE e1.id != e2.id  -- 2 rows per pair

-- < gives unique pairs: only (Alice,Bob) where Alice.id < Bob.id
WHERE e1.id < e2.id   -- 1 row per pair

Missing the LEFT JOIN for Root Nodes

In hierarchy queries, using INNER JOIN drops root-level rows (CEO, top-level categories) because they have no parent to match. Always consider whether you need LEFT JOIN to preserve these rows.

SQL Self Join FAQ

What is a self join in SQL?+
A self join is when you join a table to itself. The table appears twice in the FROM clause with different aliases, and the ON clause defines how rows in one copy relate to rows in the other copy. Self joins are used when the relationship you need exists within a single table, such as an employee table where each row has a manager_id that references another employee_id in the same table.
When would you use a self join instead of a window function?+
Self joins and window functions often solve the same problems, but each has strengths. Use a self join when you need to compare rows that are not necessarily adjacent (like finding all pairs that share an attribute) or when you need columns from two different rows in the same output row. Use a window function when you need to compare each row to its immediate predecessor or successor (LAG/LEAD) or compute a running aggregate. Window functions are usually more readable for ordered comparisons, while self joins are better for unordered pair-finding.
How do you avoid duplicate pairs in a self join?+
Use an inequality condition like WHERE a.id < b.id. If you use a.id != b.id, each pair appears twice (A-B and B-A). If you use a.id < b.id, each pair appears once (only the version where the first ID is smaller). This is the standard technique for generating unique combinations from a self join.

The adjacency primitive, wired into your query plans

Self join is where most hierarchy questions start and where most candidates hesitate. Write enough of them and the alias pattern stops feeling clever.