SQL Self-Join Practice with Examples

Q: When should I use a self-join instead of a window function?

Self-joins for non-adjacent row comparisons (LAG/LEAD only access fixed offsets), finding pairs sharing an attribute, or complex join conditions beyond simple ordering. Window functions for consecutive row comparisons or running aggregates. In interviews, if you can solve it with a window function, that's usually the cleaner answer.

Q: How do I avoid duplicate pairs in a self-join?

Use a.id < b.id in the JOIN condition instead of a.id != b.id. The < ensures each pair appears exactly once (A,B but not B,A) and excludes self-pairs (A,A). Using != gives both (A,B) and (B,A), doubling results. Interviewers specifically watch for this.

Q: Are self-joins slow on large tables?

They create a cross product filtered by the join condition, so they can be expensive. The key is having an index on the join column. Self-join on employees.manager_id with an index on id is fast. Self-join for all pairs in a million-row table without good filtering is a performance problem. Mention indexing and row count awareness.

Q: Can I self-join more than twice?

Yes. Multi-level hierarchy queries commonly chain 3-4 self-joins. Each join adds one level. For arbitrary-depth hierarchies, use a recursive CTE instead. Rule of thumb: depth fixed and small (under 5 levels) → chained self-joins. Depth varies → WITH RECURSIVE.

Self-joins are a subset of the JOIN category (~20% of SQL questions). They appear most often in problems involving hierarchical data, consecutive-event detection, and row-pair comparison.

What this guide covers

Self-joins are a subset of JOINs (~20% of SQL questions). They appear most in hierarchical data, consecutive-event detection, and row-pair comparison. Four patterns cover the majority of interview self-join questions: hierarchy traversal, row comparisons, finding pairs, gap detection. Window functions replaced self-joins for many common cases (LAG/LEAD), but self-joins still win for finding all pairs and for chained hierarchy queries.

Four self-join patterns

Hierarchy traversal

The classic self-join. A table references itself through a foreign key (manager_id pointing to id in the same employees table). Handles org charts, category trees, parent-child relationships in a single table. Use cases: find each employee + their manager; list all direct reports for a manager; build multi-level org charts (chain multiple self-joins); find root nodes (WHERE manager_id IS NULL). Interview tip: always LEFT JOIN for hierarchies, not INNER JOIN. INNER drops the CEO (or any root node) because their manager_id is NULL.

Row comparisons within a table

Compare each row to another row in the same table based on a relationship (consecutive dates, same user different time periods). The standard approach before window functions existed. Use cases: day-over-day or period-over-period changes; compare each row to a baseline; find the next event after a specific type; deltas between consecutive records. Interview tip: LAG/LEAD window functions now handle most of these more cleanly. Self-joins still win when comparing non-adjacent rows or when the comparison logic is complex.

Finding pairs

Find all pairs of rows that share a common attribute. Key trick: use a.id < b.id (not !=) to avoid both duplicate pairs and self-pairing. Each pair appears exactly once. Use cases: users in the same group; products bought together; students enrolled in same courses; mutual connections in a social graph. Interview tip: forgetting the < condition (using != instead) doubles the result with reversed pairs. Forgetting it entirely includes self-pairs. Interviewers watch for this.

Gap detection

Find gaps or missing entries in a sequence by joining a table to itself and looking for missing intermediate values. Common in time-series and data quality checks. Use cases: gaps in daily login streaks; missing sequence numbers; periods of inactivity; holes in scheduling data. Interview tip: gap detection with self-joins is verbose. LAG/LEAD is usually cleaner for consecutive gap finding. Self-joins handle the case where you need all pairs across the gap.

Hierarchy traversal syntax

SELECT e.emp_name AS employee, m.emp_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;

Row comparison syntax

WITH daily_metrics AS (
  SELECT date(transaction_date) AS day, SUM(total_amount) AS revenue
  FROM transactions
  GROUP BY date(transaction_date)
)
SELECT a.day, a.revenue, b.revenue AS prev_day_revenue,
       a.revenue - b.revenue AS daily_change
FROM daily_metrics a
JOIN daily_metrics b ON a.day = date(b.day, '+1 day')
ORDER BY a.day;

Finding pairs syntax

SELECT a.emp_name AS employee_a, b.emp_name AS employee_b,
       a.department AS shared_department
FROM employees a
JOIN employees b ON a.department = b.department
  AND a.employee_id < b.employee_id
ORDER BY a.department, a.emp_name;

Note the a.employee_id < b.employee_id condition: prevents duplicate pairs (A,B and B,A) and self-pairing.

When self-joins beat window functions (and when they don't)

Window functions replaced self-joins for many common patterns. LAG/LEAD handles consecutive row comparisons more cleanly. ROW_NUMBER handles deduplication without a self-join. Running totals with SUM() OVER are simpler than a correlated self-join.

But self-joins still win in specific situations. Finding all pairs of rows that share an attribute (users in the same group, products bought together) requires a self-join. There is no window function equivalent: the result set is fundamentally different. You need N-squared pairs, not N rows with an extra column.

Hierarchy traversal is another self-join stronghold. While recursive CTEs handle arbitrary depth, a simple two-level or three-level self-join is faster to write and easier to read when the depth is fixed.

Practical rule: comparing each row to one specific other row (previous, next, parent) → window function first. Comparing each row to many other rows or finding pairs → self-join.

Practice problems

Five problems testing self-join patterns. Some are best solved with a self-join, some with window functions. Part of the exercise is choosing the right tool.

Easy

Employees earning more than their manager

Given employees (id, name, salary, manager_id), find all who earn more than their direct manager. Return employee name, employee salary, manager name, manager salary. Hint: self-join employees to itself on e.manager_id = m.id and filter e.salary > m.salary. One of the most commonly asked self-join questions.

Hard

Consecutive login streaks

Given logins (user_id, login_date), find each user's longest streak of consecutive daily logins. Return user_id, max_streak_days. Hint: one approach is self-join logins on user_id and login_date = login_date + 1, then use gaps-and-islands. Alternative: ROW_NUMBER to assign groups and count within. The window function approach is cleaner here.

Medium

Users in the same department who never collaborated

Given employees (id, department_id) and collaborations (user_a, user_b, project_id), find all pairs in the same department who never appeared together. Return pairs only once (a < b). Hint: self-join employees on department_id with a.id < b.id, then LEFT JOIN to collaborations and filter WHERE project_id IS NULL.

Medium

Price changes: before and after

Given price_history (product_id, effective_date, price), for each change show product_id, old_price, new_price, change_date, percentage change. Hint: self-join price_history on product_id where b.effective_date is MAX less than a.effective_date. Or LAG(price) OVER (PARTITION BY product_id ORDER BY effective_date) for a cleaner solution. Compare both.

Hard

Multi-level manager chains

Given employees (id, name, manager_id), return every employee with their full management chain up to 4 levels. NULLs where the chain ends. Hint: chain four LEFT JOINs (e → m1 ON e.manager_id = m1.id → m2 ON m1.manager_id = m2.id → m3 ON m2.manager_id = m3.id). Each join goes one level up.

Worked example: employees earning more than their manager

SELECT
  e.emp_name AS employee_name,
  e.salary   AS employee_salary,
  m.emp_name AS manager_name,
  m.salary   AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > m.salary
ORDER BY e.salary - m.salary DESC;

Self-join aliases the same table twice: e for the employee row, m for the manager row. INNER JOIN is correct because employees with no manager (manager_id IS NULL) shouldn't appear.

Self-join FAQ

When should I use a self-join instead of a window function?+

Self-joins for non-adjacent row comparisons (LAG/LEAD only access fixed offsets), finding pairs sharing an attribute, or complex join conditions beyond simple ordering. Window functions for consecutive row comparisons or running aggregates. In interviews, if you can solve it with a window function, that's usually the cleaner answer.

How do I avoid duplicate pairs in a self-join?+

Use a.id < b.id in the JOIN condition instead of a.id != b.id. The < ensures each pair appears exactly once (A,B but not B,A) and excludes self-pairs (A,A). Using != gives both (A,B) and (B,A), doubling results. Interviewers specifically watch for this.

Are self-joins slow on large tables?+

They create a cross product filtered by the join condition, so they can be expensive. The key is having an index on the join column. Self-join on employees.manager_id with an index on id is fast. Self-join for all pairs in a million-row table without good filtering is a performance problem. Mention indexing and row count awareness.

Can I self-join more than twice?+

Yes. Multi-level hierarchy queries commonly chain 3-4 self-joins. Each join adds one level. For arbitrary-depth hierarchies, use a recursive CTE instead. Rule of thumb: depth fixed and small (under 5 levels) → chained self-joins. Depth varies → WITH RECURSIVE.

02 / Why practice

Build self-join intuition

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Try a hierarchy problem

Related guides

CTE Practice→

Recursive CTEs handle arbitrary-depth hierarchies where self-joins hit limits.

Window Functions Practice→

Often a cleaner alternative to self-joins.

SQL Interview Questions→

Every SQL topic tested in DE interviews.