Recursive CTE SQL: Hierarchies, Date Series, and Graph...

Q: What is a recursive CTE in SQL?

A recursive CTE is a common table expression that references itself. It contains two parts joined by UNION ALL: an anchor member (the base case that runs once) and a recursive member (which references the CTE name and runs repeatedly until it produces zero new rows). Recursive CTEs solve hierarchical and iterative problems like org chart traversal, bill-of-materials explosions, date series generation, and shortest-path calculations. Every major SQL engine supports them: PostgreSQL, SQL Server, MySQL 8.0+, Oracle, SQLite, BigQuery, Snowflake, and Databricks.

Q: How do you prevent infinite loops in recursive CTEs?

Three strategies exist. First, add a depth counter column in the recursive CTE and filter with WHERE depth < N in the recursive member. Second, use engine-level limits: SQL Server has OPTION (MAXRECURSION N) which defaults to 100, and PostgreSQL has no default limit but you can set statement_timeout. Third, track visited nodes in an array column (PostgreSQL supports this with array_agg and ANY checks) to detect cycles explicitly. In interviews, always mention at least one of these safeguards unprompted.

Q: Can you use recursive CTEs in MySQL?

Yes, starting with MySQL 8.0. The syntax uses WITH RECURSIVE followed by the CTE name, anchor member, UNION ALL, and recursive member. MySQL 8.0+ also supports the cte_max_recursion_depth system variable (default 1000) to prevent runaway recursion. MySQL 5.7 and earlier do not support CTEs at all. If you are on MySQL 8.0 or newer, recursive CTEs work the same way as in PostgreSQL and SQL Server.

The ANSI SQL:1999 standard was the one that finally added recursive queries. Before that, walking an org chart meant looping in application code, writing Oracle's proprietary CONNECT BY, or admitting defeat and denormalizing the whole hierarchy into a closure table. The 1999 committee borrowed the shape from Datalog research going back to the 1980s: a WITH RECURSIVE form with an anchor and a recursive member joined by UNION ALL.

1999

Added to ANSI SQL

2009

Postgres 8.4 ships it

2018

MySQL 8.0 catches up

SQL rounds use CTEs

How Recursive CTEs Work

A recursive CTE has two halves separated by UNION ALL. The anchor member runs first and produces the initial working set. The recursive member then runs against that working set, producing new rows. Those new rows become the working set for the next iteration. This repeats until the recursive member returns an empty result. The engine collects all rows from every iteration into the final output.

The 1999 committee drew this directly from fixpoint semantics in Datalog: start with a seed, apply a rule, collect new facts, repeat until no new facts arrive. Earlier SQL editions had no native loop construct, so engineers leaned on Oracle's CONNECT BY (added in 1985) or wrote nested procedural PL/SQL. The 1999 form replaced both with a declarative two-part structure that every major engine eventually adopted.

Basic Syntax

WITH RECURSIVE cte_name AS (
  -- Anchor member: runs once
  SELECT employee_id, manager_id, 1 AS depth
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive member: runs until zero rows
  SELECT e.employee_id, e.manager_id, c.depth + 1
  FROM employees e
  INNER JOIN cte_name c ON e.manager_id = c.employee_id
  WHERE c.depth < 20
)
SELECT * FROM cte_name;

Key Rules

RECURSIVE keyword

Required in PostgreSQL, MySQL, and SQLite. SQL Server and Oracle omit it (the engine detects self-reference automatically). BigQuery and Snowflake require it.

UNION ALL, not UNION

Always use UNION ALL. Plain UNION deduplicates between iterations, which breaks most hierarchical traversals and can cause the engine to reject the query entirely.

No aggregates in the recursive member

Most engines prohibit GROUP BY, DISTINCT, and aggregate functions in the recursive member. Aggregate in the final SELECT instead.

Base Case and Recursive Case

The anchor member is your base case. It defines where the recursion starts. For an org chart, the base case selects the CEO (WHERE manager_id IS NULL). For a date series, the base case selects the start date. For a graph traversal, the base case selects the source node. Getting the anchor right determines whether the rest of the recursion produces correct results.

The recursive member is the step function. It takes the output from the previous iteration and uses it to find the next set of rows. In a hierarchy, this means joining the base table to the CTE on the parent-child relationship. Each iteration goes one level deeper.

Anatomy of a Recursive CTE

-- Org chart: find all employees under manager_id = 1
WITH RECURSIVE reports AS (
  -- Base case: direct reports of manager 1
  SELECT
    employee_id,
    emp_name,
    manager_id,
    1 AS depth
  FROM employees
  WHERE manager_id = 1

  UNION ALL

  -- Recursive case: find reports of reports
  SELECT
    e.employee_id,
    e.emp_name,
    e.manager_id,
    r.depth + 1
  FROM employees e
  INNER JOIN reports r ON e.manager_id = r.employee_id
  WHERE r.depth < 20  -- safety guard
)
SELECT employee_id, emp_name, depth
FROM reports
ORDER BY depth, emp_name;

Iteration 0: The anchor finds all employees where manager_id = 42. Suppose that returns Alice, Bob, and Carol. Iteration 1: The recursive member finds everyone who reports to Alice, Bob, or Carol. This continues until an iteration produces zero rows.

Use Case: Hierarchical Data

Hierarchies are the most common recursive CTE application. Org charts, category trees, file system paths, bill-of-materials, and comment threads all follow the same parent-child pattern. Without recursive CTEs, you would need either a fixed number of self-joins (which limits depth) or application-layer iteration (which means multiple round trips to the database).

Category Tree with Path Accumulation

WITH RECURSIVE org_tree AS (
  -- Anchor: top-level managers
  SELECT
    employee_id,
    emp_name,
    manager_id,
    emp_name AS full_path,
    1 AS depth
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive: append report name to manager path
  SELECT
    e.employee_id,
    e.emp_name,
    e.manager_id,
    ot.full_path || ' > ' || e.emp_name,
    ot.depth + 1
  FROM employees e
  INNER JOIN org_tree ot ON e.manager_id = ot.employee_id
  WHERE ot.depth < 30
)
SELECT employee_id, emp_name, full_path, depth
FROM org_tree
ORDER BY full_path;

Bill of Materials Explosion

WITH RECURSIVE org AS (
  -- Anchor: direct reports of the top manager
  SELECT
    manager_id,
    employee_id,
    1 AS headcount,
    1 AS level
  FROM employees
  WHERE manager_id = 1

  UNION ALL

  -- Recursive: drill into reports of reports
  SELECT
    e.manager_id,
    e.employee_id,
    o.headcount AS headcount,
    o.level + 1
  FROM employees e
  INNER JOIN org o ON e.manager_id = o.employee_id
  WHERE o.level < 15
)
SELECT manager_id, SUM(headcount) AS total_reports
FROM org
GROUP BY manager_id
ORDER BY total_reports DESC;

Use Case: Date Series Generation

Time-series analysis often requires a row for every date in a range, even when no events occurred on certain days. PostgreSQL has generate_series() for this, but MySQL, SQL Server, and many cloud warehouses do not have an equivalent. Recursive CTEs fill the gap portably.

Generate Every Date in Q1 2024

-- Generate every date in Q1 2024
WITH RECURSIVE dates AS (
  SELECT date('2024-01-01') AS dt

  UNION ALL

  SELECT date(dt, '+1 day')
  FROM dates
  WHERE dt < date('2024-03-31')
)
SELECT dt
FROM dates;

Filling Gaps in Time-Series Data

WITH RECURSIVE dates AS (
  SELECT date('2024-01-01') AS dt
  UNION ALL
  SELECT date(dt, '+1 day')
  FROM dates
  WHERE dt < date('2024-03-31')
),
daily_events AS (
  SELECT
    date(event_timestamp) AS event_date,
    COUNT(*) AS event_count
  FROM event_data
  WHERE event_timestamp >= '2024-01-01'
    AND event_timestamp < '2024-04-01'
  GROUP BY date(event_timestamp)
)
SELECT
  d.dt AS day,
  COALESCE(de.event_count, 0) AS events
FROM dates d
LEFT JOIN daily_events de ON d.dt = de.event_date
ORDER BY d.dt;

Interview note: When an interviewer asks for daily metrics with no gaps, the date series CTE is the standard answer. Mention it before they ask about missing days. It shows you have dealt with sparse time-series data in production.

Use Case: Graph Traversal

Recursive CTEs can walk directed graphs stored as edge lists. This covers social network connections, dependency resolution, and shortest-path approximations. The pattern extends the hierarchy approach by tracking visited nodes to handle cycles.

Find All Nodes Reachable from a Source

-- Find all employees reachable from manager 1 (graph walk)
WITH RECURSIVE reachable AS (
  -- Anchor: start at node 1
  SELECT
    manager_id AS source_node,
    employee_id AS target_node,
    '/' || manager_id || '/' || employee_id || '/' AS visited,
    1 AS hops
  FROM employees
  WHERE manager_id = 1

  UNION ALL

  -- Recursive: follow edges, skip already-visited nodes
  SELECT
    e.manager_id,
    e.employee_id,
    r.visited || e.employee_id || '/',
    r.hops + 1
  FROM employees e
  INNER JOIN reachable r ON e.manager_id = r.target_node
  WHERE instr(r.visited, '/' || e.employee_id || '/') = 0
    AND r.hops < 10
)
SELECT target_node, MIN(hops) AS shortest_hops
FROM reachable
GROUP BY target_node
ORDER BY shortest_hops;

Dependency Resolution

-- Find all upstream managers of employee 11
WITH RECURSIVE upstream AS (
  SELECT
    manager_id AS employee_id,
    1 AS distance
  FROM employees
  WHERE employee_id = 11

  UNION ALL

  SELECT
    e.manager_id,
    u.distance + 1
  FROM employees e
  INNER JOIN upstream u ON e.employee_id = u.employee_id
  WHERE e.manager_id IS NOT NULL
    AND u.distance < 20
)
SELECT employee_id, MIN(distance) AS min_distance
FROM upstream
WHERE employee_id IS NOT NULL
GROUP BY employee_id
ORDER BY min_distance;

Termination Conditions

A recursive CTE terminates when the recursive member returns zero rows. But relying on it alone is dangerous: bad data (circular references, missing NULLs) can cause infinite recursion. Production code needs explicit guards.

1. Depth Limit

Add a depth or iteration counter. Increment it in the recursive member and filter with WHERE depth < N. This is the most common guard and works on every engine.

2. Engine-Level Limits

SQL Server defaults to 100 iterations (change with OPTION MAXRECURSION N). MySQL 8.0+ defaults to 1000 (cte_max_recursion_depth). PostgreSQL has no default limit, so you must add your own guard or set statement_timeout.

3. Visited-Node Tracking

For graphs with cycles, accumulate visited IDs in an array column (PostgreSQL) or a concatenated string (SQL Server). Check each new row against the visited set before including it. This detects cycles at the row level rather than relying on a global depth limit.

Cycle Detection Examples

-- Depth limit (works everywhere)
WHERE ct.depth < 50

-- PostgreSQL: array-based cycle detection
WHERE NOT e.target_id = ANY(r.visited_ids)
  AND r.depth < 50

Interview note: Mentioning termination conditions unprompted is a strong signal. After writing your recursive CTE, say: In production I would add a depth guard and possibly array-based cycle detection to handle dirty data.

Performance Considerations

Recursive CTEs are not free. Each iteration is a separate query execution against the working set. For wide or deep hierarchies, this can be expensive.

Factor	Impact
Depth of recursion	Each level is a round of execution. 5 levels is fast; 500 can be slow.
Branching factor	If each node has 10 children, row count grows exponentially. Filter early.
Index on join column	The recursive join must be fast. Index the parent_id / manager_id column.
Columns carried forward	Accumulating arrays or long strings per row increases memory per iteration.
Alternative data models	Materialized paths or nested sets avoid recursion entirely for read-heavy workloads.

3 Recursive CTE Interview Questions

These three questions appear consistently in data engineering interviews at companies that work with hierarchical data, which is most of them.

Q1: Given an employees table with id and manager_id, find every employee who reports to a given manager at any level.

What they test: Core recursive CTE syntax. They want anchor + recursive member + UNION ALL. Bonus points for including a depth column and a safety guard. Approach: Anchor: SELECT employees where manager_id equals the target. Recursive: JOIN employees where manager_id equals the previous level's id. Include depth counter. Mention WHERE depth < N for cycle safety.

Q2: Generate a daily date spine from 2024-01-01 to 2024-12-31 and LEFT JOIN it to an events table to show zero-event days.

What they test: Recursive CTE for sequence generation plus gap-filling with LEFT JOIN and COALESCE. This tests whether you know the date spine pattern for time-series reporting. Approach: Anchor: SELECT DATE '2024-01-01'. Recursive: add INTERVAL '1 day', stop at 2024-12-31. LEFT JOIN to daily event counts. COALESCE nulls to 0.

Q3: Given a directed edge table, find all nodes reachable from a source node and report the shortest path length.

What they test: Graph traversal with cycle detection. This is the hardest recursive CTE question. They want the visited-array pattern and a clear explanation of termination. Approach: Anchor: SELECT edges from source node, initialize visited array. Recursive: follow outgoing edges, check NOT target = ANY(visited), increment hop count. Final SELECT: GROUP BY target, MIN(hops) for shortest path. Mention that this is BFS, not Dijkstra (no weighted edges).

Recursive CTE Support by Engine

Syntax varies slightly across engines. Here is what to know for the platforms you are most likely to encounter.

Engine	RECURSIVE Keyword	Default Limit
PostgreSQL	Required	None (add your own guard)
SQL Server	Not used	100 (MAXRECURSION to change)
MySQL 8.0+	Required	1000 (cte_max_recursion_depth)
BigQuery	Required	500
Snowflake	Required	Configurable per query
Oracle	Not used (uses CONNECT BY too)	None

Recursive CTE FAQ

What is a recursive CTE in SQL?+

A recursive CTE is a common table expression that references itself. It contains two parts joined by UNION ALL: an anchor member (the base case that runs once) and a recursive member (which references the CTE name and runs repeatedly until it produces zero new rows). Recursive CTEs solve hierarchical and iterative problems like org chart traversal, bill-of-materials explosions, date series generation, and shortest-path calculations. Every major SQL engine supports them: PostgreSQL, SQL Server, MySQL 8.0+, Oracle, SQLite, BigQuery, Snowflake, and Databricks.

How do you prevent infinite loops in recursive CTEs?+

Three strategies exist. First, add a depth counter column in the recursive CTE and filter with WHERE depth < N in the recursive member. Second, use engine-level limits: SQL Server has OPTION (MAXRECURSION N) which defaults to 100, and PostgreSQL has no default limit but you can set statement_timeout. Third, track visited nodes in an array column (PostgreSQL supports this with array_agg and ANY checks) to detect cycles explicitly. In interviews, always mention at least one of these safeguards unprompted.

Can you use recursive CTEs in MySQL?+

Yes, starting with MySQL 8.0. The syntax uses WITH RECURSIVE followed by the CTE name, anchor member, UNION ALL, and recursive member. MySQL 8.0+ also supports the cte_max_recursion_depth system variable (default 1000) to prevent runaway recursion. MySQL 5.7 and earlier do not support CTEs at all. If you are on MySQL 8.0 or newer, recursive CTEs work the same way as in PostgreSQL and SQL Server.

02 / Why practice

The 1999 standard, one whiteboard at a time

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Start Practicing

Related Guides

SQL CTE Guide→

Complete CTE reference: basic syntax, chaining, CTE vs subquery, CTE vs temp table

Advanced SQL→

Window functions, lateral joins, grouping sets, and every advanced SQL topic for interviews

SQL Interview Questions→

Every SQL topic tested in data engineering interviews, with approaches and patterns