SQL Self-Join Practice with Examples
Self-joins are a subset of the JOIN category (~20% of SQL questions). They appear most often in problems involving hierarchical data, consecutive-event detection, and row-pair comparison.
What this guide covers
Self-joins are a subset of JOINs (~20% of SQL questions). They appear most in hierarchical data, consecutive-event detection, and row-pair comparison. Four patterns cover the majority of interview self-join questions: hierarchy traversal, row comparisons, finding pairs, gap detection. Window functions replaced self-joins for many common cases (LAG/LEAD), but self-joins still win for finding all pairs and for chained hierarchy queries.
Four self-join patterns
Hierarchy traversal
The classic self-join. A table references itself through a foreign key (manager_id pointing to id in the same employees table). Handles org charts, category trees, parent-child relationships in a single table. Use cases: find each employee + their manager; list all direct reports for a manager; build multi-level org charts (chain multiple self-joins); find root nodes (WHERE manager_id IS NULL). Interview tip: always LEFT JOIN for hierarchies, not INNER JOIN. INNER drops the CEO (or any root node) because their manager_id is NULL.
Row comparisons within a table
Compare each row to another row in the same table based on a relationship (consecutive dates, same user different time periods). The standard approach before window functions existed. Use cases: day-over-day or period-over-period changes; compare each row to a baseline; find the next event after a specific type; deltas between consecutive records. Interview tip: LAG/LEAD window functions now handle most of these more cleanly. Self-joins still win when comparing non-adjacent rows or when the comparison logic is complex.
Finding pairs
Find all pairs of rows that share a common attribute. Key trick: use a.id < b.id (not !=) to avoid both duplicate pairs and self-pairing. Each pair appears exactly once. Use cases: users in the same group; products bought together; students enrolled in same courses; mutual connections in a social graph. Interview tip: forgetting the < condition (using != instead) doubles the result with reversed pairs. Forgetting it entirely includes self-pairs. Interviewers watch for this.
Gap detection
Find gaps or missing entries in a sequence by joining a table to itself and looking for missing intermediate values. Common in time-series and data quality checks. Use cases: gaps in daily login streaks; missing sequence numbers; periods of inactivity; holes in scheduling data. Interview tip: gap detection with self-joins is verbose. LAG/LEAD is usually cleaner for consecutive gap finding. Self-joins handle the case where you need all pairs across the gap.
Hierarchy traversal syntax
SELECT e.emp_name AS employee, m.emp_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;Row comparison syntax
WITH daily_metrics AS (
SELECT date(transaction_date) AS day, SUM(total_amount) AS revenue
FROM transactions
GROUP BY date(transaction_date)
)
SELECT a.day, a.revenue, b.revenue AS prev_day_revenue,
a.revenue - b.revenue AS daily_change
FROM daily_metrics a
JOIN daily_metrics b ON a.day = date(b.day, '+1 day')
ORDER BY a.day;Finding pairs syntax
SELECT a.emp_name AS employee_a, b.emp_name AS employee_b,
a.department AS shared_department
FROM employees a
JOIN employees b ON a.department = b.department
AND a.employee_id < b.employee_id
ORDER BY a.department, a.emp_name;Note the a.employee_id < b.employee_id condition: prevents duplicate pairs (A,B and B,A) and self-pairing.
When self-joins beat window functions (and when they don't)
Window functions replaced self-joins for many common patterns. LAG/LEAD handles consecutive row comparisons more cleanly. ROW_NUMBER handles deduplication without a self-join. Running totals with SUM() OVER are simpler than a correlated self-join.
But self-joins still win in specific situations. Finding all pairs of rows that share an attribute (users in the same group, products bought together) requires a self-join. There is no window function equivalent: the result set is fundamentally different. You need N-squared pairs, not N rows with an extra column.
Hierarchy traversal is another self-join stronghold. While recursive CTEs handle arbitrary depth, a simple two-level or three-level self-join is faster to write and easier to read when the depth is fixed.
Practical rule: comparing each row to one specific other row (previous, next, parent) → window function first. Comparing each row to many other rows or finding pairs → self-join.
Practice problems
Five problems testing self-join patterns. Some are best solved with a self-join, some with window functions. Part of the exercise is choosing the right tool.
Employees earning more than their manager
Given employees (id, name, salary, manager_id), find all who earn more than their direct manager. Return employee name, employee salary, manager name, manager salary. Hint: self-join employees to itself on e.manager_id = m.id and filter e.salary > m.salary. One of the most commonly asked self-join questions.
Consecutive login streaks
Given logins (user_id, login_date), find each user's longest streak of consecutive daily logins. Return user_id, max_streak_days. Hint: one approach is self-join logins on user_id and login_date = login_date + 1, then use gaps-and-islands. Alternative: ROW_NUMBER to assign groups and count within. The window function approach is cleaner here.
Users in the same department who never collaborated
Given employees (id, department_id) and collaborations (user_a, user_b, project_id), find all pairs in the same department who never appeared together. Return pairs only once (a < b). Hint: self-join employees on department_id with a.id < b.id, then LEFT JOIN to collaborations and filter WHERE project_id IS NULL.
Price changes: before and after
Given price_history (product_id, effective_date, price), for each change show product_id, old_price, new_price, change_date, percentage change. Hint: self-join price_history on product_id where b.effective_date is MAX less than a.effective_date. Or LAG(price) OVER (PARTITION BY product_id ORDER BY effective_date) for a cleaner solution. Compare both.
Multi-level manager chains
Given employees (id, name, manager_id), return every employee with their full management chain up to 4 levels. NULLs where the chain ends. Hint: chain four LEFT JOINs (e → m1 ON e.manager_id = m1.id → m2 ON m1.manager_id = m2.id → m3 ON m2.manager_id = m3.id). Each join goes one level up.
Worked example: employees earning more than their manager
SELECT
e.emp_name AS employee_name,
e.salary AS employee_salary,
m.emp_name AS manager_name,
m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > m.salary
ORDER BY e.salary - m.salary DESC;Self-join aliases the same table twice: e for the employee row, m for the manager row. INNER JOIN is correct because employees with no manager (manager_id IS NULL) shouldn't appear.
Self-join FAQ
When should I use a self-join instead of a window function?+
How do I avoid duplicate pairs in a self-join?+
Are self-joins slow on large tables?+
Can I self-join more than twice?+
Build self-join intuition
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition