SQL Self Join: Syntax, Use Cases, and Interview Questions

Q: What is a self join in SQL?

A self join is when you join a table to itself. The table appears twice in the FROM clause with different aliases, and the ON clause defines how rows in one copy relate to rows in the other copy. Self joins are used when the relationship you need exists within a single table, such as an employee table where each row has a manager_id that references another employee_id in the same table.

Q: When would you use a self join instead of a window function?

Self joins and window functions often solve the same problems, but each has strengths. Use a self join when you need to compare rows that are not necessarily adjacent (like finding all pairs that share an attribute) or when you need columns from two different rows in the same output row. Use a window function when you need to compare each row to its immediate predecessor or successor (LAG/LEAD) or compute a running aggregate. Window functions are usually more readable for ordered comparisons, while self joins are better for unordered pair-finding.

Q: How do you avoid duplicate pairs in a self join?

Use an inequality condition like WHERE a.id < b.id. If you use a.id != b.id, each pair appears twice (A-B and B-A). If you use a.id < b.id, each pair appears once (only the version where the first ID is smaller). This is the standard technique for generating unique combinations from a self join.

Self joins show up in 6% of verified DE interview SQL rounds, and the shape is predictable. Nine times out of ten the table is called employees, the two aliases are e and m, and the question is walking an org chart from IC to reporting manager. It's the archetypal adjacency-list-on-a-single-table problem, and it belongs in the same architectural bucket as recursive CTEs: tools for relating rows within the same dataset.

Rounds use self joins

9/10

Employee/manager shape

Level per join

Use CTE instead

Self Join Syntax

A self join uses the same table twice with different aliases. The aliases are required because without them the database cannot tell which 'copy' of the table you mean in the ON clause and SELECT list.

Basic Self Join

-- Basic self join: employees and their managers
SELECT
  e.employee_id,
  e.emp_name AS employee_name,
  e.manager_id,
  m.emp_name AS manager_name
FROM employees e
INNER JOIN employees m ON e.manager_id = m.employee_id;

In this query, the employees table appears twice: once as e (the employee) and once as m (the manager). The ON clause says: find the row in m whose employee_id matches the current employee's manager_id.

Architectural note: INNER JOIN drops the root node of the hierarchy, which is almost never what a reporting query wants. LEFT JOIN keeps every employee and projects NULL for the top of the chain, which matches the shape a downstream dashboard or a dimension table expects. Pick the join type based on whether the root is data or noise.

Self Join with LEFT JOIN

-- Keep all employees, show NULL for top-level managers
SELECT
  e.employee_id,
  e.emp_name AS employee_name,
  m.emp_name AS manager_name  -- NULL for CEO / top-level
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;

Interview note: Always clarify whether the question requires all employees or only those with managers. This determines whether you use INNER JOIN or LEFT JOIN. Asking this question shows the interviewer you think about data completeness.

Use Case 1: Hierarchies

The classic self join use case. An employee table has an employee_id and a manager_id. The manager_id points to another employee_id in the same table. To get an employee alongside their manager's details, you self join.

Multi-Level Hierarchy

-- Multi-level hierarchy: employee -> manager -> director
SELECT
  e.emp_name AS employee,
  m.emp_name AS manager,
  d.emp_name AS director
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id
LEFT JOIN employees d ON m.manager_id = d.employee_id;

Each additional level adds another self join. For deep hierarchies (5+ levels), this becomes impractical. At that point, a recursive CTE is the better approach.

Recursive CTE for Arbitrary Depth

-- Recursive CTE for arbitrary-depth hierarchy
WITH RECURSIVE org_chart AS (
  -- Base case: top-level (no manager)
  SELECT employee_id, emp_name, manager_id, 0 AS depth
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive case: join children to parents
  SELECT e.employee_id, e.emp_name, e.manager_id, oc.depth + 1
  FROM employees e
  INNER JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT * FROM org_chart ORDER BY depth, emp_name;

Interview tip: Start with a self join for 1-2 levels. If the interviewer asks 'what if the hierarchy is N levels deep,' switch to a recursive CTE. Showing both approaches demonstrates range.

Use Case 2: Consecutive Event Comparison

Self joins can compare each row to the next or previous row. This is useful for detecting changes, calculating differences, and finding gaps in sequences. Window functions (LAG/LEAD) are often cleaner for this, but self joins work in databases that do not support window functions and are still tested in interviews.

Consecutive Row Comparison

-- Compare each day's revenue to the previous day
WITH daily_revenue AS (
  SELECT
    substr(transaction_date, 1, 10) AS day,
    SUM(total_amount) AS revenue
  FROM transactions
  GROUP BY substr(transaction_date, 1, 10)
)
SELECT
  curr.day,
  curr.revenue AS today_revenue,
  prev.revenue AS yesterday_revenue,
  curr.revenue - prev.revenue AS daily_change,
  CASE
    WHEN prev.revenue = 0 THEN NULL
    ELSE ROUND(100.0 * (curr.revenue - prev.revenue) / prev.revenue, 1)
  END AS pct_change
FROM daily_revenue curr
INNER JOIN daily_revenue prev
  ON curr.day = date(prev.day, '+1 day')
ORDER BY curr.day;

-- Find consecutive logins (user logged in two days in a row)
WITH logins AS (
  SELECT DISTINCT user_id, substr(session_start, 1, 10) AS login_date
  FROM user_sessions
)
SELECT DISTINCT a.user_id, a.login_date
FROM logins a
INNER JOIN logins b
  ON a.user_id = b.user_id
  AND a.login_date = date(b.login_date, '+1 day');

Self Join vs LAG/LEAD

-- Window function approach (cleaner for ordered comparisons)
WITH daily_revenue AS (
  SELECT
    substr(transaction_date, 1, 10) AS day,
    SUM(total_amount) AS revenue
  FROM transactions
  GROUP BY substr(transaction_date, 1, 10)
)
SELECT
  day,
  revenue AS today_revenue,
  LAG(revenue) OVER (ORDER BY day) AS yesterday_revenue,
  revenue - LAG(revenue) OVER (ORDER BY day) AS daily_change
FROM daily_revenue
ORDER BY day;

The window function is shorter and avoids the join entirely. But the self join is more flexible: it can compare non-adjacent rows, handle irregular intervals, and join on multiple conditions that do not fit the strict row-ordering that LAG/LEAD require.

Use Case 3: Finding Pairs

Self joins find rows that share a common attribute. This pattern answers questions like 'which employees share the same manager' or 'which products are in the same category.'

Pair Finding

-- Employees who share the same manager
SELECT
  e1.emp_name AS employee_1,
  e2.emp_name AS employee_2,
  e1.manager_id
FROM employees e1
INNER JOIN employees e2
  ON e1.manager_id = e2.manager_id
  AND e1.employee_id < e2.employee_id  -- unique pairs only
ORDER BY e1.manager_id, e1.emp_name;

-- Products in the same category with similar prices (within 10%)
SELECT
  p1.product_name AS product_1,
  p2.product_name AS product_2,
  p1.category,
  p1.price AS price_1,
  p2.price AS price_2
FROM products p1
INNER JOIN products p2
  ON p1.category = p2.category
  AND p1.product_id < p2.product_id
  AND ABS(p1.price - p2.price) <= p1.price * 0.10
ORDER BY p1.category, p1.product_name;

-- Users who purchased the same product
SELECT
  o1.user_id AS user_1,
  o2.user_id AS user_2,
  o1.product_id,
  p.product_name
FROM order_items o1
INNER JOIN order_items o2
  ON o1.product_id = o2.product_id
  AND o1.user_id < o2.user_id
INNER JOIN products p ON o1.product_id = p.product_id
ORDER BY o1.product_id;

3 Self Join Interview Questions

Self join questions are popular in on-site interviews because they test whether you can model relationships within a single table.

Given an employees table with (id, name, manager_id), write a query to show each employee's name alongside their manager's name.

LEFT JOIN employees e to employees m on e.manager_id = m.id. Select e.name as employee, m.name as manager. LEFT JOIN keeps employees with NULL manager_id (the CEO).

Find all pairs of users who logged in on the same day. Each pair should appear once.

Self join user_logins a to user_logins b on a.login_date = b.login_date AND a.user_id < b.user_id. This gives every unique pair of users who logged in on the same day, without duplicates or self-pairs.

Given a table of daily stock prices (date, ticker, price), find days where the price increased compared to the previous day for each ticker.

Self join: curr.ticker = prev.ticker AND curr.date = prev.date + INTERVAL '1 day'. Filter WHERE curr.price > prev.price. Mention that LAG(price) OVER (PARTITION BY ticker ORDER BY date) is an alternative approach.

Common Self Join Mistakes

These errors come up repeatedly in interviews and code reviews.

Common Mistake

Forgetting to Exclude Self-Pairs

Without a condition to prevent it, a self join pairs every row with itself. An employee becomes their own colleague. Add WHERE e1.employee_id != e2.employee_id at minimum, or WHERE e1.employee_id < e2.employee_id to get unique pairs.

Subtle Bug

Using != Instead of < for Unique Pairs

Using != eliminates self-pairs but keeps both orderings: Alice-Bob and Bob-Alice. Using < keeps only one ordering: only (Alice, Bob) where Alice.id < Bob.id. This halves the result set to the correct unique pairs.

Data Loss

Missing the LEFT JOIN for Root Nodes

In hierarchy queries, using INNER JOIN drops root-level rows (CEO, top-level categories) because they have no parent to match. Always consider whether you need LEFT JOIN to preserve these rows.

SQL Self Join FAQ

What is a self join in SQL?+

A self join is when you join a table to itself. The table appears twice in the FROM clause with different aliases, and the ON clause defines how rows in one copy relate to rows in the other copy. Self joins are used when the relationship you need exists within a single table, such as an employee table where each row has a manager_id that references another employee_id in the same table.

When would you use a self join instead of a window function?+

Self joins and window functions often solve the same problems, but each has strengths. Use a self join when you need to compare rows that are not necessarily adjacent (like finding all pairs that share an attribute) or when you need columns from two different rows in the same output row. Use a window function when you need to compare each row to its immediate predecessor or successor (LAG/LEAD) or compute a running aggregate. Window functions are usually more readable for ordered comparisons, while self joins are better for unordered pair-finding.

How do you avoid duplicate pairs in a self join?+

Use an inequality condition like WHERE a.id < b.id. If you use a.id != b.id, each pair appears twice (A-B and B-A). If you use a.id < b.id, each pair appears once (only the version where the first ID is smaller). This is the standard technique for generating unique combinations from a self join.

02 / Why practice

The adjacency primitive, wired into your query plans

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Start Practicing

Related Guides

Self-Join Practice Problems→

Hierarchy traversals, pair-finding, and consecutive-row comparisons with real SQL execution

JOIN Practice Problems→

INNER, LEFT, CROSS, and self-join problems at interview difficulty

SQL Interview Questions→

Complete guide to every SQL topic tested in data engineering interviews