CTEs didn't exist in standard SQL until SQL:1999. Before that, every complex query was either a nested subquery mess or a temporary table plus three extra statements. The 1999 spec added the WITH clause and, almost as an afterthought, recursive CTEs. That afterthought quietly replaced the ugly self-join loops engineers had been writing since the 1980s.
PostgreSQL shipped WITH support in 8.4 (2009). SQL Server had it since 2005. Oracle waited until 11g Release 2. MySQL held out until version 8.0 in 2018, which is why you'll still find engineers who treat CTEs as modern magic. In our verified corpus CTEs appear in 8% of interview rounds, usually where the problem is deep enough that a nested subquery would stop being readable.
Year CTEs entered ANSI SQL
Interview rounds using CTEs
MySQL finally added WITH
Postgres 8.4 WITH support
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
A CTE is a named temporary result set declared with the WITH keyword, scoped to the single statement that follows. The original 1999 spec called it a "query expression body with a with clause," which is exactly as readable as you'd expect from a committee document. The idea came from research on recursive query languages in the 1980s, and it shipped in DB2 first.
CTEs solve two problems. First, readability: instead of nesting subqueries three levels deep, you write each step as a named CTE and the final SELECT reads like a story. Second, reuse: if you need the same derived table in multiple places (e.g., in both a JOIN and a WHERE EXISTS), a CTE lets you define it once.
Every major SQL engine supports CTEs: PostgreSQL, MySQL 8.0+, SQL Server, Oracle, SQLite, BigQuery, Snowflake, Redshift, Trino, and Databricks. If you are writing SQL in 2024 or later, CTEs are always available.
Interview note: CTEs are the default tool for structuring interview answers. When an interviewer gives you a multi-step problem, start by sketching CTE names on the whiteboard before writing any SQL. This signals structured thinking.
The WITH keyword introduces one or more CTEs. Each CTE has a name, an optional column list, and a SELECT statement. The main query follows after the last CTE.
-- Single CTE
WITH active_users AS (
SELECT user_id, name, email
FROM users
WHERE status = 'active'
AND last_login >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
au.name,
au.email,
COUNT(o.order_id) AS recent_orders
FROM active_users au
LEFT JOIN orders o ON au.user_id = o.user_id
AND o.order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY au.name, au.email
ORDER BY recent_orders DESC;One WITH keyword: Even with multiple CTEs, you write WITH only once. Separate CTEs with commas.
Column aliases (optional): You can specify column names after the CTE name: WITH totals (region, revenue) AS (...). If omitted, column names come from the inner SELECT.
Scope: A CTE is visible only to the statement that follows it. You cannot reference it in a separate query. It is not a view or a temp table.
Multiple references: You can reference the same CTE multiple times in the main query (e.g., in a self-join). The engine decides whether to materialize it or inline it.
-- CTE with explicit column aliases
WITH monthly_revenue (month, region, revenue) AS (
SELECT
DATE_TRUNC('month', order_date),
region,
SUM(amount)
FROM orders
GROUP BY 1, 2
)
SELECT month, region, revenue
FROM monthly_revenue
WHERE revenue > 50000
ORDER BY month, revenue DESC;These five patterns cover the range of CTE usage you will encounter in production code and interviews. Each one solves a different category of problem.
The simplest CTE replaces a subquery with a named step. This does not change what the query computes. It changes how you read and maintain it. For one-off usage, a subquery is fine. The moment you nest two levels or reference the same derived table twice, a CTE wins.
-- Find departments where average salary
-- exceeds the company average
WITH company_avg AS (
SELECT AVG(salary) AS avg_salary
FROM employees
)
SELECT
department,
ROUND(AVG(salary), 0) AS dept_avg,
(SELECT avg_salary FROM company_avg) AS company_avg
FROM employees
GROUP BY department
HAVING AVG(salary) > (SELECT avg_salary FROM company_avg);Interview note: The company_avg CTE is referenced twice: once in SELECT and once in HAVING. Without a CTE, you would write the same subquery in both places. CTEs eliminate that duplication.
Recursive CTEs reference themselves. They have an anchor member (the base case) and a recursive member (the iteration step), connected by UNION ALL. The engine runs the anchor once, then repeatedly runs the recursive member until it produces zero new rows. This solves hierarchical problems: org charts, category trees, bill-of-materials explosions, and graph paths.
-- Walk an employee hierarchy from the CEO down
WITH RECURSIVE org_tree AS (
-- Anchor: start at the CEO (no manager)
SELECT id, name, manager_id, 1 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: find direct reports of current level
SELECT e.id, e.name, e.manager_id, ot.depth + 1
FROM employees e
INNER JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT id, name, depth
FROM org_tree
ORDER BY depth, name;Interview note: Always include a depth or iteration counter and mention the safety limit. PostgreSQL defaults to no limit. SQL Server defaults to 100 (OPTION (MAXRECURSION N) to change it). In interviews, say: “I would add a WHERE depth < 20 in the recursive member to prevent infinite loops from bad data.”
Each CTE can reference any CTE defined above it. This creates a data pipeline inside a single query: filter, then aggregate, then rank, then output. Each step has a name that documents what it does. This is the pattern interviewers use to test whether you can decompose a problem into clean steps.
WITH monthly_orders AS (
-- Step 1: aggregate orders by user and month
SELECT
user_id,
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS order_count,
SUM(amount) AS total_spent
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY user_id, DATE_TRUNC('month', order_date)
),
ranked AS (
-- Step 2: rank users within each month
SELECT *,
RANK() OVER (
PARTITION BY month ORDER BY total_spent DESC
) AS spend_rank
FROM monthly_orders
),
top_spenders AS (
-- Step 3: keep top 10 per month
SELECT * FROM ranked WHERE spend_rank <= 10
)
-- Step 4: join back to users for names
SELECT
u.name,
ts.month,
ts.order_count,
ts.total_spent,
ts.spend_rank
FROM top_spenders ts
INNER JOIN users u ON ts.user_id = u.id
ORDER BY ts.month, ts.spend_rank;Interview note: Name your CTEs after what they compute, not after where they sit in the pipeline. Names like “step1” and “temp” make the interviewer do extra work to understand your query. Names like “monthly_orders” and “top_spenders” are self-documenting.
CTEs and subqueries often produce the same result. The choice is about readability, reuse, and (in some engines) performance. Here is the same query written both ways.
Subquery
SELECT department, avg_sal
FROM (
SELECT department, AVG(salary) AS avg_sal
FROM employees
GROUP BY department
) dept_avgs
WHERE avg_sal > (
SELECT AVG(salary) FROM employees
);CTE
WITH dept_avgs AS (
SELECT department, AVG(salary) AS avg_sal
FROM employees
GROUP BY department
),
company_avg AS (
SELECT AVG(salary) AS avg_sal
FROM employees
)
SELECT d.department, d.avg_sal
FROM dept_avgs d
WHERE d.avg_sal > (SELECT avg_sal FROM company_avg);Use a subquery when: the derived table is used once, is short (under 5 lines), and nesting does not hurt readability.
Use a CTE when: the derived table is referenced multiple times, the query has more than one level of nesting, or the logic has distinct named steps.
CTEs exist only for one statement. Temp tables persist for the session (or transaction). They solve different problems, but the choice confuses people. Here is when each one makes sense.
| Feature | CTE | Temp Table |
|---|---|---|
| Scope | Single statement | Session or transaction |
| Indexable | No | Yes |
| Materialized | Engine decides (usually inlined) | Always materialized to disk/memory |
| Multi-statement use | No | Yes |
| Best for | Query readability, single-query pipelines | Multi-query workflows, large intermediate results |
Interview note: In an interview, default to CTEs. They keep everything in one query, which is easier for the interviewer to read. Mention temp tables only if the problem spans multiple queries or the intermediate result is reused across statements (e.g., in a stored procedure).
Recursive CTEs are the only way to walk hierarchical data in pure SQL without knowing the depth in advance. They replace what would otherwise require a loop in application code or a series of self-joins (one per level). The engine handles the iteration.
Iteration 0 (Anchor): The anchor SELECT runs once. Its output becomes the initial working table.
Iteration 1+: The recursive SELECT runs against the working table from the previous iteration. New rows produced become the next working table. This repeats until the recursive SELECT produces zero rows.
Final result: UNION ALL combines all rows from all iterations into the final output.
Not all recursive CTEs are about trees. A common pattern generates a series of dates to fill gaps in time-series data. This is useful when you need every day in a range, even days with no events.
-- Generate every date in January 2024
WITH RECURSIVE date_series AS (
SELECT DATE '2024-01-01' AS dt
UNION ALL
SELECT dt + INTERVAL '1 day'
FROM date_series
WHERE dt < DATE '2024-01-31'
)
SELECT dt FROM date_series;Recursive CTEs can build a path string as they traverse. This shows the full chain from root to leaf, which is useful for debugging hierarchy data and for display in UIs.
WITH RECURSIVE category_path AS (
-- Anchor: top-level categories (no parent)
SELECT
id,
name,
name AS full_path,
1 AS depth
FROM categories
WHERE parent_id IS NULL
UNION ALL
-- Recursive: append child name to parent path
SELECT
c.id,
c.name,
cp.full_path || ' > ' || c.name,
cp.depth + 1
FROM categories c
INNER JOIN category_path cp ON c.parent_id = cp.id
)
SELECT full_path, depth
FROM category_path
ORDER BY full_path;Bad data (circular references) causes recursive CTEs to run forever. Two safeguards exist. The first is a depth limit in the recursive member. The second is engine-level: SQL Server has MAXRECURSION (default 100), and PostgreSQL will run until memory is exhausted unless you add a guard.
-- Guard against cycles with a depth limit
WITH RECURSIVE org_tree AS (
SELECT id, name, manager_id, 1 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, ot.depth + 1
FROM employees e
INNER JOIN org_tree ot ON e.manager_id = ot.id
WHERE ot.depth < 50 -- safety limit
)
SELECT * FROM org_tree;
-- SQL Server: set explicit recursion limit
-- SELECT * FROM org_tree OPTION (MAXRECURSION 200);Interview note: Mentioning cycle detection unprompted shows production experience. Say: “In production I would add a depth limit and optionally track visited IDs in an array to detect cycles explicitly.” PostgreSQL supports array accumulation for this: WHERE NOT e.id = ANY(visited_ids).
CTEs show up in every stage of the interview process. Phone screens test basic WITH syntax. Onsites test recursive traversal and multi-step pipelines.
What they test:
Basic CTE syntax and refactoring skills. The interviewer hands you a two-level nested query and watches you extract it into named steps. They care about your naming choices and whether the result is actually more readable.
Approach:
Start from the innermost subquery. Extract it as the first CTE. Name it after what it computes (e.g., dept_averages, not temp1). Extract the next level. The final SELECT should read like a sentence: “from dept_averages, where avg exceeds threshold.”
What they test:
CTE chaining with window functions. The classic top-N-per-group pattern requires aggregation, ranking, and filtering. A chained CTE handles each step.
Approach:
First CTE: aggregate revenue by product and category. Second CTE: add ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC). Final SELECT: WHERE rn <= 3. Discuss RANK vs ROW_NUMBER if the interviewer asks about ties.
What they test:
Recursive CTE syntax and hierarchy traversal. This is the most common recursive CTE question. They want anchor + recursive member + UNION ALL, and they want you to include a depth counter.
Approach:
Anchor: SELECT the target manager. Recursive: JOIN employees on manager_id = previous level's id. Track depth. Mention the cycle-prevention guard (depth < N). State the termination condition: recursion stops when no new rows are produced.
What they test:
Conceptual clarity about scope, materialization, and use cases. This is a knowledge question, not a coding question. The answer shows whether you understand what the engine actually does with each construct.
Approach:
CTEs: single-statement scope, not indexable, optimizer may inline. Temp tables: session scope, indexable, always materialized. Use CTEs for readability in one query. Use temp tables when you need the result across multiple statements, when you need to index it, or when forcing materialization improves a specific query plan.
What they test:
CTE combined with window functions. The CTE aggregates to the day level; the window function computes the cumulative sum. This tests whether you can separate aggregation from running calculations.
Approach:
CTE: GROUP BY date with SUM(amount) AS daily_revenue. Main SELECT: SUM(daily_revenue) OVER (ORDER BY date) AS running_total. Mention that the window function's default frame (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) gives the correct cumulative sum.
What they test:
Engine-specific CTE behavior. Before PostgreSQL 12, CTEs were optimization fences: the engine always materialized them, preventing predicate pushdown and other optimizations. This is a trick question that separates candidates who have dealt with real performance issues from those who have only read syntax docs.
Approach:
Explain that PostgreSQL 11 and earlier materialized all CTEs. Filters in the main query could not push into the CTE. In PG 12+, the optimizer can inline CTEs (controlled by the MATERIALIZED / NOT MATERIALIZED hints). If stuck on PG 11, rewrite the CTE as a subquery to let the optimizer push predicates down.
These show up in real interview debriefs. Each one is avoidable with a clear mental model of how CTEs work.
A common syntax error: writing WITH for each CTE instead of separating them with commas. Only the first CTE gets WITH. Every subsequent CTE is separated by a comma. This error is immediate feedback that you have not written many CTEs.
-- WRONG: multiple WITH keywords
WITH step1 AS (SELECT ...)
WITH step2 AS (SELECT ... FROM step1) -- syntax error
SELECT * FROM step2;
-- CORRECT: one WITH, comma-separated
WITH step1 AS (
SELECT ...
),
step2 AS (
SELECT ... FROM step1
)
SELECT * FROM step2;This is a fast rejection signal. Practice writing multi-CTE queries until the comma separator is automatic.
The anchor and recursive members must be connected by UNION ALL. Using UNION (without ALL) works in some engines but deduplicates rows between iterations, which breaks hierarchy traversals where the same value appears at different depths. Worse, some engines reject UNION entirely in recursive CTEs.
-- WRONG: UNION deduplicates, may break recursion
WITH RECURSIVE tree AS (
SELECT id, parent_id, 1 AS depth FROM nodes WHERE parent_id IS NULL
UNION -- should be UNION ALL
SELECT n.id, n.parent_id, t.depth + 1
FROM nodes n JOIN tree t ON n.parent_id = t.id
)
SELECT * FROM tree;
-- CORRECT: UNION ALL preserves all rows
WITH RECURSIVE tree AS (
SELECT id, parent_id, 1 AS depth FROM nodes WHERE parent_id IS NULL
UNION ALL
SELECT n.id, n.parent_id, t.depth + 1
FROM nodes n JOIN tree t ON n.parent_id = t.id
)
SELECT * FROM tree;Always use UNION ALL in recursive CTEs. If you need deduplication, apply DISTINCT in the final SELECT, not between iterations.
Not every query benefits from a CTE. A single scalar comparison (WHERE amount > (SELECT AVG(amount) FROM orders)) is clearer as an inline subquery. Wrapping it in a CTE adds syntax overhead with no readability gain. Interviewers notice when you add structure without purpose.
-- Overkill: CTE for a simple scalar
WITH avg_order AS (
SELECT AVG(amount) AS val FROM orders
)
SELECT * FROM orders
WHERE amount > (SELECT val FROM avg_order);
-- Simpler: inline subquery
SELECT * FROM orders
WHERE amount > (SELECT AVG(amount) FROM orders);Use CTEs when they add clarity or enable reuse. For one-off scalar comparisons, keep the subquery inline. Show the interviewer you pick the right tool for each situation.
Every recursive query you see in a Leetcode-style round traces back to the 1999 spec. Learn the decomposition, then practice until a three-stage CTE feels like prose.