Recursive CTE SQL: Hierarchies, Date Series, and Graph...

The ANSI SQL:1999 standard was the one that finally added recursive queries. Before that, walking an org chart meant looping in application code, writing Oracle's proprietary CONNECT BY, or admitting defeat and denormalizing the whole hierarchy into a closure table. The 1999 committee borrowed the shape from Datalog research going back to the 1980s: a WITH RECURSIVE form with an anchor and a recursive member joined by UNION ALL.

How Recursive CTEs Work

A recursive CTE has two halves separated by UNION ALL. The anchor member runs first and produces the initial working set. The recursive member then runs against that working set, producing new rows. Those new rows become the working set for the next iteration. This repeats until the recursive member returns an empty result. The engine collects all rows from every iteration into the final output.

The 1999 committee drew this directly from fixpoint semantics in Datalog: start with a seed, apply a rule, collect new facts, repeat until no new facts arrive. Earlier SQL editions had no native loop construct, so engineers leaned on Oracle's CONNECT BY (added in 1985) or wrote nested procedural PL/SQL. The 1999 form replaced both with a declarative two-part structure that every major engine eventually adopted.

Prepare for the interview
01 / Open invite
02min.

Know Recursive CTE the way the interviewer who asks it knows it.

a Recursive CTE query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1SELECT user_id,
2 COUNT(*) AS sessions
3FROM events
4WHERE ts >= NOW() - INTERVAL '7 day'
5
Execute your solution0.4s avg.
UberInterview question
Solve a Recursive CTE problem
1999
Added to ANSI SQL
2009
Postgres 8.4 ships it
2018
MySQL 8.0 catches up
8%
SQL rounds use CTEs

Base Case and Recursive Case

The anchor member is your base case. It defines where the recursion starts. For an org chart, the base case selects the CEO (WHERE manager_id IS NULL). For a date series, the base case selects the start date. For a graph traversal, the base case selects the source node. Getting the anchor right determines whether the rest of the recursion produces correct results.

The recursive member is the step function. It takes the output from the previous iteration and uses it to find the next set of rows. In a hierarchy, this means joining the base table to the CTE on the parent-child relationship. Each iteration goes one level deeper.

Basic Syntax

WITH RECURSIVE cte_name AS (
  -- Anchor member: runs once
  SELECT columns
  FROM base_table
  WHERE starting_condition

  UNION ALL

  -- Recursive member: runs until zero rows
  SELECT columns
  FROM base_table
  INNER JOIN cte_name ON join_condition
  WHERE termination_guard
)
SELECT * FROM cte_name;

Use Case: Hierarchical Data

Hierarchies are the most common recursive CTE application. Org charts, category trees, file system paths, bill-of-materials, and comment threads all follow the same parent-child pattern. Without recursive CTEs, you would need either a fixed number of self-joins (which limits depth) or application-layer iteration (which means multiple round trips to the database).

Prepare for the interview
03 / From the bank03 of many
03hand-picked.

Recent Price Drops

Medium10 min

The price just dropped. Who noticed?

Key Rules

RECURSIVE keyword

Required in PostgreSQL, MySQL, and SQLite. SQL Server and Oracle omit it (the engine detects self-reference automatically). BigQuery and Snowflake require it.

UNION ALL, not UNION

Always use UNION ALL. Plain UNION deduplicates between iterations, which breaks most hierarchical traversals and can cause the engine to reject the query entirely.

No aggregates in the recursive member

Most engines prohibit GROUP BY, DISTINCT, and aggregate functions in the recursive member. Aggregate in the final SELECT instead.

Use Case: Date Series Generation

Time-series analysis often requires a row for every date in a range, even when no events occurred on certain days. PostgreSQL has generate_series() for this, but MySQL, SQL Server, and many cloud warehouses do not have an equivalent. Recursive CTEs fill the gap portably.

Anatomy of a Recursive CTE

-- Org chart: find all employees under manager_id = 42
WITH RECURSIVE reports AS (
  -- Base case: direct reports of manager 42
  SELECT
    id,
    name,
    manager_id,
    1 AS depth
  FROM employees
  WHERE manager_id = 42

  UNION ALL

  -- Recursive case: find reports of reports
  SELECT
    e.id,
    e.name,
    e.manager_id,
    r.depth + 1
  FROM employees e
  INNER JOIN reports r ON e.manager_id = r.id
  WHERE r.depth < 20  -- safety guard
)
SELECT id, name, depth
FROM reports
ORDER BY depth, name;

Iteration 0: The anchor finds all employees where manager_id = 42. Suppose that returns Alice, Bob, and Carol. Iteration 1: The recursive member finds everyone who reports to Alice, Bob, or Carol. This continues until an iteration produces zero rows.

Use Case: Graph Traversal

Recursive CTEs can walk directed graphs stored as edge lists. This covers social network connections, dependency resolution, and shortest-path approximations. The pattern extends the hierarchy approach by tracking visited nodes to handle cycles.

Category Tree with Path Accumulation

WITH RECURSIVE category_tree AS (
  -- Anchor: top-level categories
  SELECT
    id,
    name,
    parent_id,
    name::TEXT AS full_path,
    1 AS depth
  FROM categories
  WHERE parent_id IS NULL

  UNION ALL

  -- Recursive: append child name to parent path
  SELECT
    c.id,
    c.name,
    c.parent_id,
    ct.full_path || ' > ' || c.name,
    ct.depth + 1
  FROM categories c
  INNER JOIN category_tree ct ON c.parent_id = ct.id
  WHERE ct.depth < 30
)
SELECT id, name, full_path, depth
FROM category_tree
ORDER BY full_path;

Bill of Materials Explosion

WITH RECURSIVE bom AS (
  -- Anchor: top-level product
  SELECT
    part_id,
    component_id,
    quantity,
    1 AS level
  FROM parts_hierarchy
  WHERE part_id = 'PRODUCT-001'

  UNION ALL

  -- Recursive: drill into sub-components
  SELECT
    ph.part_id,
    ph.component_id,
    ph.quantity * b.quantity AS quantity,
    b.level + 1
  FROM parts_hierarchy ph
  INNER JOIN bom b ON ph.part_id = b.component_id
  WHERE b.level < 15
)
SELECT component_id, SUM(quantity) AS total_needed
FROM bom
GROUP BY component_id
ORDER BY total_needed DESC;

Generate Every Date in Q1 2024

-- Generate every date in Q1 2024
WITH RECURSIVE dates AS (
  SELECT DATE '2024-01-01' AS dt

  UNION ALL

  SELECT dt + INTERVAL '1 day'
  FROM dates
  WHERE dt < DATE '2024-03-31'
)
SELECT dt
FROM dates;

Filling Gaps in Time-Series Data

WITH RECURSIVE dates AS (
  SELECT DATE '2024-01-01' AS dt
  UNION ALL
  SELECT dt + INTERVAL '1 day'
  FROM dates
  WHERE dt < DATE '2024-03-31'
),
daily_events AS (
  SELECT
    DATE(event_time) AS event_date,
    COUNT(*) AS event_count
  FROM events
  WHERE event_time >= '2024-01-01'
    AND event_time < '2024-04-01'
  GROUP BY DATE(event_time)
)
SELECT
  d.dt AS date,
  COALESCE(de.event_count, 0) AS events
FROM dates d
LEFT JOIN daily_events de ON d.dt = de.event_date
ORDER BY d.dt;

Interview note: When an interviewer asks for daily metrics with no gaps, the date series CTE is the standard answer. Mention it before they ask about missing days. It shows you have dealt with sparse time-series data in production.

Find All Nodes Reachable from a Source

-- Find all nodes reachable from node 'A'
WITH RECURSIVE reachable AS (
  -- Anchor: start at node A
  SELECT
    source_node,
    target_node,
    ARRAY[source_node] AS visited,
    1 AS hops
  FROM edges
  WHERE source_node = 'A'

  UNION ALL

  -- Recursive: follow edges, skip visited nodes
  SELECT
    e.source_node,
    e.target_node,
    r.visited || e.target_node,
    r.hops + 1
  FROM edges e
  INNER JOIN reachable r ON e.source_node = r.target_node
  WHERE NOT e.target_node = ANY(r.visited)
    AND r.hops < 10
)
SELECT DISTINCT target_node, MIN(hops) AS shortest_hops
FROM reachable
GROUP BY target_node
ORDER BY shortest_hops;

Dependency Resolution

-- Find all upstream dependencies of task 'final_report'
WITH RECURSIVE upstream AS (
  SELECT
    dependency_task AS task_name,
    1 AS distance
  FROM task_dependencies
  WHERE task_name = 'final_report'

  UNION ALL

  SELECT
    td.dependency_task,
    u.distance + 1
  FROM task_dependencies td
  INNER JOIN upstream u ON td.task_name = u.task_name
  WHERE u.distance < 20
)
SELECT DISTINCT task_name, MIN(distance) AS min_distance
FROM upstream
GROUP BY task_name
ORDER BY min_distance;

Termination Conditions

A recursive CTE terminates when the recursive member returns zero rows. But relying on it alone is dangerous: bad data (circular references, missing NULLs) can cause infinite recursion. Production code needs explicit guards.

1. Depth Limit

Add a depth or iteration counter. Increment it in the recursive member and filter with WHERE depth < N. This is the most common guard and works on every engine.

2. Engine-Level Limits

SQL Server defaults to 100 iterations (change with OPTION MAXRECURSION N). MySQL 8.0+ defaults to 1000 (cte_max_recursion_depth). PostgreSQL has no default limit, so you must add your own guard or set statement_timeout.

3. Visited-Node Tracking

For graphs with cycles, accumulate visited IDs in an array column (PostgreSQL) or a concatenated string (SQL Server). Check each new row against the visited set before including it. This detects cycles at the row level rather than relying on a global depth limit.

Cycle Detection Examples

-- Depth limit (works everywhere)
WHERE ct.depth < 50

-- PostgreSQL: array-based cycle detection
WHERE NOT e.target_id = ANY(r.visited_ids)
  AND r.depth < 50

Interview note: Mentioning termination conditions unprompted is a strong signal. After writing your recursive CTE, say: In production I would add a depth guard and possibly array-based cycle detection to handle dirty data.

Performance Considerations

Recursive CTEs are not free. Each iteration is a separate query execution against the working set. For wide or deep hierarchies, this can be expensive.

FactorImpact
Depth of recursionEach level is a round of execution. 5 levels is fast; 500 can be slow.
Branching factorIf each node has 10 children, row count grows exponentially. Filter early.
Index on join columnThe recursive join must be fast. Index the parent_id / manager_id column.
Columns carried forwardAccumulating arrays or long strings per row increases memory per iteration.
Alternative data modelsMaterialized paths or nested sets avoid recursion entirely for read-heavy workloads.

3 Recursive CTE Interview Questions

These three questions appear consistently in data engineering interviews at companies that work with hierarchical data, which is most of them.

Q1: Given an employees table with id and manager_id, find every employee who reports to a given manager at any level.

What they test: Core recursive CTE syntax. They want anchor + recursive member + UNION ALL. Bonus points for including a depth column and a safety guard. Approach: Anchor: SELECT employees where manager_id equals the target. Recursive: JOIN employees where manager_id equals the previous level's id. Include depth counter. Mention WHERE depth < N for cycle safety.

Q2: Generate a daily date spine from 2024-01-01 to 2024-12-31 and LEFT JOIN it to an events table to show zero-event days.

What they test: Recursive CTE for sequence generation plus gap-filling with LEFT JOIN and COALESCE. This tests whether you know the date spine pattern for time-series reporting. Approach: Anchor: SELECT DATE '2024-01-01'. Recursive: add INTERVAL '1 day', stop at 2024-12-31. LEFT JOIN to daily event counts. COALESCE nulls to 0.

Q3: Given a directed edge table, find all nodes reachable from a source node and report the shortest path length.

What they test: Graph traversal with cycle detection. This is the hardest recursive CTE question. They want the visited-array pattern and a clear explanation of termination. Approach: Anchor: SELECT edges from source node, initialize visited array. Recursive: follow outgoing edges, check NOT target = ANY(visited), increment hop count. Final SELECT: GROUP BY target, MIN(hops) for shortest path. Mention that this is BFS, not Dijkstra (no weighted edges).

Recursive CTE Support by Engine

Syntax varies slightly across engines. Here is what to know for the platforms you are most likely to encounter.

EngineRECURSIVE KeywordDefault Limit
PostgreSQLRequiredNone (add your own guard)
SQL ServerNot used100 (MAXRECURSION to change)
MySQL 8.0+Required1000 (cte_max_recursion_depth)
BigQueryRequired500
SnowflakeRequiredConfigurable per query
OracleNot used (uses CONNECT BY too)None

Recursive CTE FAQ

What is a recursive CTE in SQL?+
A recursive CTE is a common table expression that references itself. It contains two parts joined by UNION ALL: an anchor member (the base case that runs once) and a recursive member (which references the CTE name and runs repeatedly until it produces zero new rows). Recursive CTEs solve hierarchical and iterative problems like org chart traversal, bill-of-materials explosions, date series generation, and shortest-path calculations. Every major SQL engine supports them: PostgreSQL, SQL Server, MySQL 8.0+, Oracle, SQLite, BigQuery, Snowflake, and Databricks.
How do you prevent infinite loops in recursive CTEs?+
Three strategies exist. First, add a depth counter column in the recursive CTE and filter with WHERE depth < N in the recursive member. Second, use engine-level limits: SQL Server has OPTION (MAXRECURSION N) which defaults to 100, and PostgreSQL has no default limit but you can set statement_timeout. Third, track visited nodes in an array column (PostgreSQL supports this with array_agg and ANY checks) to detect cycles explicitly. In interviews, always mention at least one of these safeguards unprompted.
Can you use recursive CTEs in MySQL?+
Yes, starting with MySQL 8.0. The syntax uses WITH RECURSIVE followed by the CTE name, anchor member, UNION ALL, and recursive member. MySQL 8.0+ also supports the cte_max_recursion_depth system variable (default 1000) to prevent runaway recursion. MySQL 5.7 and earlier do not support CTEs at all. If you are on MySQL 8.0 or newer, recursive CTEs work the same way as in PostgreSQL and SQL Server.
02 / Why practice

The 1999 standard, one whiteboard at a time

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Related Guides