Recursive CTE SQL: Hierarchies, Date Series, and Graph...
The ANSI SQL:1999 standard was the one that finally added recursive queries. Before that, walking an org chart meant looping in application code, writing Oracle's proprietary CONNECT BY, or admitting defeat and denormalizing the whole hierarchy into a closure table. The 1999 committee borrowed the shape from Datalog research going back to the 1980s: a WITH RECURSIVE form with an anchor and a recursive member joined by UNION ALL.
How Recursive CTEs Work
A recursive CTE has two halves separated by UNION ALL. The anchor member runs first and produces the initial working set. The recursive member then runs against that working set, producing new rows. Those new rows become the working set for the next iteration. This repeats until the recursive member returns an empty result. The engine collects all rows from every iteration into the final output.
The 1999 committee drew this directly from fixpoint semantics in Datalog: start with a seed, apply a rule, collect new facts, repeat until no new facts arrive. Earlier SQL editions had no native loop construct, so engineers leaned on Oracle's CONNECT BY (added in 1985) or wrote nested procedural PL/SQL. The 1999 form replaced both with a declarative two-part structure that every major engine eventually adopted.
Know Recursive CTE the way the interviewer who asks it knows it.
Base Case and Recursive Case
The anchor member is your base case. It defines where the recursion starts. For an org chart, the base case selects the CEO (WHERE manager_id IS NULL). For a date series, the base case selects the start date. For a graph traversal, the base case selects the source node. Getting the anchor right determines whether the rest of the recursion produces correct results.
The recursive member is the step function. It takes the output from the previous iteration and uses it to find the next set of rows. In a hierarchy, this means joining the base table to the CTE on the parent-child relationship. Each iteration goes one level deeper.
Basic Syntax
WITH RECURSIVE cte_name AS (
-- Anchor member: runs once
SELECT columns
FROM base_table
WHERE starting_condition
UNION ALL
-- Recursive member: runs until zero rows
SELECT columns
FROM base_table
INNER JOIN cte_name ON join_condition
WHERE termination_guard
)
SELECT * FROM cte_name;Use Case: Hierarchical Data
Hierarchies are the most common recursive CTE application. Org charts, category trees, file system paths, bill-of-materials, and comment threads all follow the same parent-child pattern. Without recursive CTEs, you would need either a fixed number of self-joins (which limits depth) or application-layer iteration (which means multiple round trips to the database).
Pulled from debriefs where SQL was the gate.
Key Rules
RECURSIVE keyword
Required in PostgreSQL, MySQL, and SQLite. SQL Server and Oracle omit it (the engine detects self-reference automatically). BigQuery and Snowflake require it.
UNION ALL, not UNION
Always use UNION ALL. Plain UNION deduplicates between iterations, which breaks most hierarchical traversals and can cause the engine to reject the query entirely.
No aggregates in the recursive member
Most engines prohibit GROUP BY, DISTINCT, and aggregate functions in the recursive member. Aggregate in the final SELECT instead.
Use Case: Date Series Generation
Time-series analysis often requires a row for every date in a range, even when no events occurred on certain days. PostgreSQL has generate_series() for this, but MySQL, SQL Server, and many cloud warehouses do not have an equivalent. Recursive CTEs fill the gap portably.
Anatomy of a Recursive CTE
-- Org chart: find all employees under manager_id = 42
WITH RECURSIVE reports AS (
-- Base case: direct reports of manager 42
SELECT
id,
name,
manager_id,
1 AS depth
FROM employees
WHERE manager_id = 42
UNION ALL
-- Recursive case: find reports of reports
SELECT
e.id,
e.name,
e.manager_id,
r.depth + 1
FROM employees e
INNER JOIN reports r ON e.manager_id = r.id
WHERE r.depth < 20 -- safety guard
)
SELECT id, name, depth
FROM reports
ORDER BY depth, name;Iteration 0: The anchor finds all employees where manager_id = 42. Suppose that returns Alice, Bob, and Carol. Iteration 1: The recursive member finds everyone who reports to Alice, Bob, or Carol. This continues until an iteration produces zero rows.
Use Case: Graph Traversal
Recursive CTEs can walk directed graphs stored as edge lists. This covers social network connections, dependency resolution, and shortest-path approximations. The pattern extends the hierarchy approach by tracking visited nodes to handle cycles.
Category Tree with Path Accumulation
WITH RECURSIVE category_tree AS (
-- Anchor: top-level categories
SELECT
id,
name,
parent_id,
name::TEXT AS full_path,
1 AS depth
FROM categories
WHERE parent_id IS NULL
UNION ALL
-- Recursive: append child name to parent path
SELECT
c.id,
c.name,
c.parent_id,
ct.full_path || ' > ' || c.name,
ct.depth + 1
FROM categories c
INNER JOIN category_tree ct ON c.parent_id = ct.id
WHERE ct.depth < 30
)
SELECT id, name, full_path, depth
FROM category_tree
ORDER BY full_path;Bill of Materials Explosion
WITH RECURSIVE bom AS (
-- Anchor: top-level product
SELECT
part_id,
component_id,
quantity,
1 AS level
FROM parts_hierarchy
WHERE part_id = 'PRODUCT-001'
UNION ALL
-- Recursive: drill into sub-components
SELECT
ph.part_id,
ph.component_id,
ph.quantity * b.quantity AS quantity,
b.level + 1
FROM parts_hierarchy ph
INNER JOIN bom b ON ph.part_id = b.component_id
WHERE b.level < 15
)
SELECT component_id, SUM(quantity) AS total_needed
FROM bom
GROUP BY component_id
ORDER BY total_needed DESC;Generate Every Date in Q1 2024
-- Generate every date in Q1 2024
WITH RECURSIVE dates AS (
SELECT DATE '2024-01-01' AS dt
UNION ALL
SELECT dt + INTERVAL '1 day'
FROM dates
WHERE dt < DATE '2024-03-31'
)
SELECT dt
FROM dates;Filling Gaps in Time-Series Data
WITH RECURSIVE dates AS (
SELECT DATE '2024-01-01' AS dt
UNION ALL
SELECT dt + INTERVAL '1 day'
FROM dates
WHERE dt < DATE '2024-03-31'
),
daily_events AS (
SELECT
DATE(event_time) AS event_date,
COUNT(*) AS event_count
FROM events
WHERE event_time >= '2024-01-01'
AND event_time < '2024-04-01'
GROUP BY DATE(event_time)
)
SELECT
d.dt AS date,
COALESCE(de.event_count, 0) AS events
FROM dates d
LEFT JOIN daily_events de ON d.dt = de.event_date
ORDER BY d.dt;Interview note: When an interviewer asks for daily metrics with no gaps, the date series CTE is the standard answer. Mention it before they ask about missing days. It shows you have dealt with sparse time-series data in production.
Find All Nodes Reachable from a Source
-- Find all nodes reachable from node 'A'
WITH RECURSIVE reachable AS (
-- Anchor: start at node A
SELECT
source_node,
target_node,
ARRAY[source_node] AS visited,
1 AS hops
FROM edges
WHERE source_node = 'A'
UNION ALL
-- Recursive: follow edges, skip visited nodes
SELECT
e.source_node,
e.target_node,
r.visited || e.target_node,
r.hops + 1
FROM edges e
INNER JOIN reachable r ON e.source_node = r.target_node
WHERE NOT e.target_node = ANY(r.visited)
AND r.hops < 10
)
SELECT DISTINCT target_node, MIN(hops) AS shortest_hops
FROM reachable
GROUP BY target_node
ORDER BY shortest_hops;Dependency Resolution
-- Find all upstream dependencies of task 'final_report'
WITH RECURSIVE upstream AS (
SELECT
dependency_task AS task_name,
1 AS distance
FROM task_dependencies
WHERE task_name = 'final_report'
UNION ALL
SELECT
td.dependency_task,
u.distance + 1
FROM task_dependencies td
INNER JOIN upstream u ON td.task_name = u.task_name
WHERE u.distance < 20
)
SELECT DISTINCT task_name, MIN(distance) AS min_distance
FROM upstream
GROUP BY task_name
ORDER BY min_distance;Termination Conditions
A recursive CTE terminates when the recursive member returns zero rows. But relying on it alone is dangerous: bad data (circular references, missing NULLs) can cause infinite recursion. Production code needs explicit guards.
1. Depth Limit
Add a depth or iteration counter. Increment it in the recursive member and filter with WHERE depth < N. This is the most common guard and works on every engine.
2. Engine-Level Limits
SQL Server defaults to 100 iterations (change with OPTION MAXRECURSION N). MySQL 8.0+ defaults to 1000 (cte_max_recursion_depth). PostgreSQL has no default limit, so you must add your own guard or set statement_timeout.
3. Visited-Node Tracking
For graphs with cycles, accumulate visited IDs in an array column (PostgreSQL) or a concatenated string (SQL Server). Check each new row against the visited set before including it. This detects cycles at the row level rather than relying on a global depth limit.
Cycle Detection Examples
-- Depth limit (works everywhere)
WHERE ct.depth < 50
-- PostgreSQL: array-based cycle detection
WHERE NOT e.target_id = ANY(r.visited_ids)
AND r.depth < 50Interview note: Mentioning termination conditions unprompted is a strong signal. After writing your recursive CTE, say: In production I would add a depth guard and possibly array-based cycle detection to handle dirty data.
Performance Considerations
Recursive CTEs are not free. Each iteration is a separate query execution against the working set. For wide or deep hierarchies, this can be expensive.
| Factor | Impact |
|---|---|
| Depth of recursion | Each level is a round of execution. 5 levels is fast; 500 can be slow. |
| Branching factor | If each node has 10 children, row count grows exponentially. Filter early. |
| Index on join column | The recursive join must be fast. Index the parent_id / manager_id column. |
| Columns carried forward | Accumulating arrays or long strings per row increases memory per iteration. |
| Alternative data models | Materialized paths or nested sets avoid recursion entirely for read-heavy workloads. |
3 Recursive CTE Interview Questions
These three questions appear consistently in data engineering interviews at companies that work with hierarchical data, which is most of them.
Q1: Given an employees table with id and manager_id, find every employee who reports to a given manager at any level.
What they test: Core recursive CTE syntax. They want anchor + recursive member + UNION ALL. Bonus points for including a depth column and a safety guard. Approach: Anchor: SELECT employees where manager_id equals the target. Recursive: JOIN employees where manager_id equals the previous level's id. Include depth counter. Mention WHERE depth < N for cycle safety.
Q2: Generate a daily date spine from 2024-01-01 to 2024-12-31 and LEFT JOIN it to an events table to show zero-event days.
What they test: Recursive CTE for sequence generation plus gap-filling with LEFT JOIN and COALESCE. This tests whether you know the date spine pattern for time-series reporting. Approach: Anchor: SELECT DATE '2024-01-01'. Recursive: add INTERVAL '1 day', stop at 2024-12-31. LEFT JOIN to daily event counts. COALESCE nulls to 0.
Q3: Given a directed edge table, find all nodes reachable from a source node and report the shortest path length.
What they test: Graph traversal with cycle detection. This is the hardest recursive CTE question. They want the visited-array pattern and a clear explanation of termination. Approach: Anchor: SELECT edges from source node, initialize visited array. Recursive: follow outgoing edges, check NOT target = ANY(visited), increment hop count. Final SELECT: GROUP BY target, MIN(hops) for shortest path. Mention that this is BFS, not Dijkstra (no weighted edges).
Recursive CTE Support by Engine
Syntax varies slightly across engines. Here is what to know for the platforms you are most likely to encounter.
| Engine | RECURSIVE Keyword | Default Limit |
|---|---|---|
| PostgreSQL | Required | None (add your own guard) |
| SQL Server | Not used | 100 (MAXRECURSION to change) |
| MySQL 8.0+ | Required | 1000 (cte_max_recursion_depth) |
| BigQuery | Required | 500 |
| Snowflake | Required | Configurable per query |
| Oracle | Not used (uses CONNECT BY too) | None |
Recursive CTE FAQ
What is a recursive CTE in SQL?+
How do you prevent infinite loops in recursive CTEs?+
Can you use recursive CTEs in MySQL?+
The 1999 standard, one whiteboard at a time
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition
Related Guides
Complete CTE reference: basic syntax, chaining, CTE vs subquery, CTE vs temp table
Window functions, lateral joins, grouping sets, and every advanced SQL topic for interviews
Every SQL topic tested in data engineering interviews, with approaches and patterns