SQL Reference

Recursive CTE SQL: The Complete Guide

The ANSI SQL:1999 standard was the one that finally added recursive queries. Before that, walking an org chart meant looping in application code, writing Oracle's proprietary CONNECT BY, or admitting defeat and denormalizing the whole hierarchy into a closure table. The 1999 committee borrowed the shape from Datalog research going back to the 1980s: a WITH RECURSIVE form with an anchor member and a recursive member joined by UNION ALL. Postgres shipped it in 8.4 (2009), SQLite in 3.8.3 (2014), and MySQL waited until 8.0 in 2018.

A quarter century later the pattern hasn't changed. Anchor row seeds the working set, recursive member produces the next iteration, loop ends when no rows come back. That's the shape the 1999 editors chose, and it's the one every interviewer still expects you to write.

1999

Added to ANSI SQL

2009

Postgres 8.4 ships it

2018

MySQL 8.0 catches up

8%

SQL rounds use CTEs

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

How Recursive CTEs Work

A recursive CTE has two halves separated by UNION ALL. The anchor member runs first and produces the initial working set. The recursive member then runs against that working set, producing new rows. Those new rows become the working set for the next iteration. This repeats until the recursive member returns an empty result. The engine collects all rows from every iteration into the final output.

The 1999 committee drew this directly from fixpoint semantics in Datalog: start with a seed, apply a rule, collect new facts, repeat until no new facts arrive. Earlier SQL editions had no native loop construct, so engineers leaned on Oracle's CONNECT BY (added in 1985) or wrote nested procedural PL/SQL. The 1999 form replaced both with a declarative two-part structure that every major engine eventually adopted.

Basic Syntax

WITH RECURSIVE cte_name AS (
  -- Anchor member: runs once
  SELECT columns
  FROM base_table
  WHERE starting_condition

  UNION ALL

  -- Recursive member: runs until zero rows
  SELECT columns
  FROM base_table
  INNER JOIN cte_name ON join_condition
  WHERE termination_guard
)
SELECT * FROM cte_name;

RECURSIVE keyword: Required in PostgreSQL, MySQL, and SQLite. SQL Server and Oracle omit it (the engine detects self-reference automatically). BigQuery and Snowflake require it.

UNION ALL, not UNION: Always use UNION ALL. Plain UNION deduplicates between iterations, which breaks most hierarchical traversals and can cause the engine to reject the query entirely.

No aggregates in the recursive member: Most engines prohibit GROUP BY, DISTINCT, and aggregate functions in the recursive member. Aggregate in the final SELECT instead.

Interview note: When an interviewer asks you to traverse a hierarchy, start by stating the two-part structure out loud: “The anchor selects the root nodes, and the recursive member joins back to find children.” This signals you understand the mechanism before you write a line of code.

Base Case and Recursive Case

The anchor member is your base case. It defines where the recursion starts. For an org chart, the base case selects the CEO (WHERE manager_id IS NULL). For a date series, the base case selects the start date. For a graph traversal, the base case selects the source node. Getting the anchor right determines whether the rest of the recursion produces correct results.

The recursive member is the step function. It takes the output from the previous iteration and uses it to find the next set of rows. In a hierarchy, this means joining the base table to the CTE on the parent-child relationship. Each iteration goes one level deeper.

Anatomy of a Recursive CTE

-- Org chart: find all employees under manager_id = 42
WITH RECURSIVE reports AS (
  -- Base case: direct reports of manager 42
  SELECT
    id,
    name,
    manager_id,
    1 AS depth
  FROM employees
  WHERE manager_id = 42

  UNION ALL

  -- Recursive case: find reports of reports
  SELECT
    e.id,
    e.name,
    e.manager_id,
    r.depth + 1
  FROM employees e
  INNER JOIN reports r ON e.manager_id = r.id
  WHERE r.depth < 20  -- safety guard
)
SELECT id, name, depth
FROM reports
ORDER BY depth, name;

Iteration 0: The anchor finds all employees where manager_id = 42. Suppose that returns Alice, Bob, and Carol. These three rows are the working set.

Iteration 1: The recursive member joins employees to the working set (Alice, Bob, Carol). It finds everyone who reports to Alice, Bob, or Carol. Those new rows become the next working set.

Iteration N: This continues until an iteration produces zero rows. At that point, the engine concatenates all rows from all iterations and returns them.

Interview note: The depth counter serves double duty. It gives you the hierarchy level for output, and it gives you a termination guard. Always include it.

Use Case: Hierarchical Data

Hierarchies are the most common recursive CTE application. Org charts, category trees, file system paths, bill-of-materials, and comment threads all follow the same parent-child pattern. Without recursive CTEs, you would need either a fixed number of self-joins (which limits depth) or application-layer iteration (which means multiple round trips to the database).

Category Tree with Path Accumulation

This pattern builds the full path from root to each node as a concatenated string. It is useful for breadcrumb navigation, debugging hierarchy data, and generating display labels.

WITH RECURSIVE category_tree AS (
  -- Anchor: top-level categories
  SELECT
    id,
    name,
    parent_id,
    name::TEXT AS full_path,
    1 AS depth
  FROM categories
  WHERE parent_id IS NULL

  UNION ALL

  -- Recursive: append child name to parent path
  SELECT
    c.id,
    c.name,
    c.parent_id,
    ct.full_path || ' > ' || c.name,
    ct.depth + 1
  FROM categories c
  INNER JOIN category_tree ct ON c.parent_id = ct.id
  WHERE ct.depth < 30
)
SELECT id, name, full_path, depth
FROM category_tree
ORDER BY full_path;

Bill of Materials Explosion

Manufacturing data often has a parts-to-subparts hierarchy. A recursive CTE can “explode” a product into every component at every level, accumulating quantities along the way.

WITH RECURSIVE bom AS (
  -- Anchor: top-level product
  SELECT
    part_id,
    component_id,
    quantity,
    1 AS level
  FROM parts_hierarchy
  WHERE part_id = 'PRODUCT-001'

  UNION ALL

  -- Recursive: drill into sub-components
  SELECT
    ph.part_id,
    ph.component_id,
    ph.quantity * b.quantity AS quantity,
    b.level + 1
  FROM parts_hierarchy ph
  INNER JOIN bom b ON ph.part_id = b.component_id
  WHERE b.level < 15
)
SELECT component_id, SUM(quantity) AS total_needed
FROM bom
GROUP BY component_id
ORDER BY total_needed DESC;

Use Case: Date Series Generation

Time-series analysis often requires a row for every date in a range, even when no events occurred on certain days. PostgreSQL has generate_series() for this, but MySQL, SQL Server, and many cloud warehouses do not have an equivalent. Recursive CTEs fill the gap portably.

-- Generate every date in Q1 2024
WITH RECURSIVE dates AS (
  SELECT DATE '2024-01-01' AS dt

  UNION ALL

  SELECT dt + INTERVAL '1 day'
  FROM dates
  WHERE dt < DATE '2024-03-31'
)
SELECT dt
FROM dates;

The anchor produces the start date. The recursive member adds one day and stops when the date exceeds the end of the range. The result is 91 rows, one per day. You can then LEFT JOIN this to your events table to fill in zeros for missing days.

Filling Gaps in Time-Series Data

WITH RECURSIVE dates AS (
  SELECT DATE '2024-01-01' AS dt
  UNION ALL
  SELECT dt + INTERVAL '1 day'
  FROM dates
  WHERE dt < DATE '2024-03-31'
),
daily_events AS (
  SELECT
    DATE(event_time) AS event_date,
    COUNT(*) AS event_count
  FROM events
  WHERE event_time >= '2024-01-01'
    AND event_time < '2024-04-01'
  GROUP BY DATE(event_time)
)
SELECT
  d.dt AS date,
  COALESCE(de.event_count, 0) AS events
FROM dates d
LEFT JOIN daily_events de ON d.dt = de.event_date
ORDER BY d.dt;

Interview note: When an interviewer asks for daily metrics with no gaps, the date series CTE is the standard answer. Mention it before they ask about missing days. It shows you have dealt with sparse time-series data in production.

Use Case: Graph Traversal

Recursive CTEs can walk directed graphs stored as edge lists. This covers social network connections, dependency resolution, and shortest-path approximations. The pattern extends the hierarchy approach by tracking visited nodes to handle cycles.

-- Find all nodes reachable from node 'A'
WITH RECURSIVE reachable AS (
  -- Anchor: start at node A
  SELECT
    source_node,
    target_node,
    ARRAY[source_node] AS visited,
    1 AS hops
  FROM edges
  WHERE source_node = 'A'

  UNION ALL

  -- Recursive: follow edges, skip visited nodes
  SELECT
    e.source_node,
    e.target_node,
    r.visited || e.target_node,
    r.hops + 1
  FROM edges e
  INNER JOIN reachable r ON e.source_node = r.target_node
  WHERE NOT e.target_node = ANY(r.visited)
    AND r.hops < 10
)
SELECT DISTINCT target_node, MIN(hops) AS shortest_hops
FROM reachable
GROUP BY target_node
ORDER BY shortest_hops;

The visited array prevents infinite loops when the graph has cycles. Each iteration checks whether the target node has already been visited in the current path. This is PostgreSQL-specific syntax; in SQL Server you would concatenate visited node IDs into a VARCHAR and use CHARINDEX for the check.

Dependency Resolution

Data pipeline dependencies form a directed acyclic graph (DAG). A recursive CTE can find all upstream dependencies of a given task, which is useful for impact analysis when a source table changes.

-- Find all upstream dependencies of task 'final_report'
WITH RECURSIVE upstream AS (
  SELECT
    dependency_task AS task_name,
    1 AS distance
  FROM task_dependencies
  WHERE task_name = 'final_report'

  UNION ALL

  SELECT
    td.dependency_task,
    u.distance + 1
  FROM task_dependencies td
  INNER JOIN upstream u ON td.task_name = u.task_name
  WHERE u.distance < 20
)
SELECT DISTINCT task_name, MIN(distance) AS min_distance
FROM upstream
GROUP BY task_name
ORDER BY min_distance;

Termination Conditions

A recursive CTE terminates when the recursive member returns zero rows. This is the implicit termination condition. But relying on it alone is dangerous: bad data (circular references, missing NULLs) can cause infinite recursion. Production code needs explicit guards.

1. Depth Limit

Add a depth or iteration counter. Increment it in the recursive member and filter with WHERE depth < N. This is the most common guard and works on every engine.

-- Stop at depth 50 regardless of data
WHERE ct.depth < 50

2. Engine-Level Limits

SQL Server defaults to 100 iterations (change with OPTION MAXRECURSION N). MySQL 8.0+ defaults to 1000 (cte_max_recursion_depth). PostgreSQL has no default limit, so you must add your own guard or set statement_timeout.

3. Visited-Node Tracking

For graphs with cycles, accumulate visited IDs in an array column (PostgreSQL) or a concatenated string (SQL Server). Check each new row against the visited set before including it. This detects cycles at the row level rather than relying on a global depth limit.

-- PostgreSQL: array-based cycle detection
WHERE NOT e.target_id = ANY(r.visited_ids)
  AND r.depth < 50

Interview note: Mentioning termination conditions unprompted is a strong signal. After writing your recursive CTE, say: “In production I would add a depth guard and possibly array-based cycle detection to handle dirty data.”

Performance Considerations

Recursive CTEs are not free. Each iteration is a separate query execution against the working set. For wide or deep hierarchies, this can be expensive. Understanding the cost model helps you decide when a recursive CTE is the right tool and when an alternative approach (materialized paths, nested sets, or application-layer traversal) is better.

FactorImpact
Depth of recursionEach level is a round of execution. 5 levels is fast; 500 can be slow.
Branching factorIf each node has 10 children, row count grows exponentially. Filter early.
Index on join columnThe recursive join must be fast. Index the parent_id / manager_id column.
Columns carried forwardAccumulating arrays or long strings per row increases memory per iteration.
Alternative data modelsMaterialized paths or nested sets avoid recursion entirely for read-heavy workloads.

For most interview problems, recursion depth is under 20 and performance is not a concern. But interviewers will sometimes ask: “What if this table has 10 million rows and 50 levels?” The correct answer is to discuss indexing, depth limits, and whether a different data model (like ltree in PostgreSQL) would serve the access pattern better.

3 Recursive CTE Interview Questions

These three questions appear consistently in data engineering interviews at companies that work with hierarchical data, which is most of them.

Q1: Given an employees table with id and manager_id, find every employee who reports to a given manager at any level.

What they test:

Core recursive CTE syntax. They want anchor + recursive member + UNION ALL. Bonus points for including a depth column and a safety guard.

Approach:

Anchor: SELECT employees where manager_id equals the target. Recursive: JOIN employees where manager_id equals the previous level's id. Include depth counter. Mention WHERE depth < N for cycle safety.

Q2: Generate a daily date spine from 2024-01-01 to 2024-12-31 and LEFT JOIN it to an events table to show zero-event days.

What they test:

Recursive CTE for sequence generation plus gap-filling with LEFT JOIN and COALESCE. This tests whether you know the date spine pattern for time-series reporting.

Approach:

Anchor: SELECT DATE '2024-01-01'. Recursive: add INTERVAL '1 day', stop at 2024-12-31. LEFT JOIN to daily event counts. COALESCE nulls to 0.

Q3: Given a directed edge table, find all nodes reachable from a source node and report the shortest path length.

What they test:

Graph traversal with cycle detection. This is the hardest recursive CTE question. They want the visited-array pattern and a clear explanation of termination.

Approach:

Anchor: SELECT edges from source node, initialize visited array. Recursive: follow outgoing edges, check NOT target = ANY(visited), increment hop count. Final SELECT: GROUP BY target, MIN(hops) for shortest path. Mention that this is BFS, not Dijkstra (no weighted edges).

Recursive CTE Support by Engine

Syntax varies slightly across engines. Here is what to know for the platforms you are most likely to encounter.

EngineRECURSIVE KeywordDefault Limit
PostgreSQLRequiredNone (add your own guard)
SQL ServerNot used100 (MAXRECURSION to change)
MySQL 8.0+Required1000 (cte_max_recursion_depth)
BigQueryRequired500
SnowflakeRequiredConfigurable per query
OracleNot used (uses CONNECT BY too)None

Recursive CTE FAQ

What is a recursive CTE in SQL?+
A recursive CTE is a common table expression that references itself. It contains two parts joined by UNION ALL: an anchor member (the base case that runs once) and a recursive member (which references the CTE name and runs repeatedly until it produces zero new rows). Recursive CTEs solve hierarchical and iterative problems like org chart traversal, bill-of-materials explosions, date series generation, and shortest-path calculations. Every major SQL engine supports them: PostgreSQL, SQL Server, MySQL 8.0+, Oracle, SQLite, BigQuery, Snowflake, and Databricks.
How do you prevent infinite loops in recursive CTEs?+
Three strategies exist. First, add a depth counter column in the recursive CTE and filter with WHERE depth < N in the recursive member. Second, use engine-level limits: SQL Server has OPTION (MAXRECURSION N) which defaults to 100, and PostgreSQL has no default limit but you can set statement_timeout. Third, track visited nodes in an array column (PostgreSQL supports this with array_agg and ANY checks) to detect cycles explicitly. In interviews, always mention at least one of these safeguards unprompted.
Can you use recursive CTEs in MySQL?+
Yes, starting with MySQL 8.0. The syntax uses WITH RECURSIVE followed by the CTE name, anchor member, UNION ALL, and recursive member. MySQL 8.0+ also supports the cte_max_recursion_depth system variable (default 1000) to prevent runaway recursion. MySQL 5.7 and earlier do not support CTEs at all. If you are on MySQL 8.0 or newer, recursive CTEs work the same way as in PostgreSQL and SQL Server.

The 1999 standard, one whiteboard at a time

Twenty-five years after the committee shipped WITH RECURSIVE, candidates still stumble on the anchor/recursive split. Practice until you can write it without looking up the shape.