SQL Reference

SQL CTE (Common Table Expression)

CTEs didn't exist in standard SQL until SQL:1999. Before that, every complex query was either a nested subquery mess or a temporary table plus three extra statements. The 1999 spec added the WITH clause and, almost as an afterthought, recursive CTEs. That afterthought quietly replaced the ugly self-join loops engineers had been writing since the 1980s.

PostgreSQL shipped WITH support in 8.4 (2009). SQL Server had it since 2005. Oracle waited until 11g Release 2. MySQL held out until version 8.0 in 2018, which is why you'll still find engineers who treat CTEs as modern magic. In our verified corpus CTEs appear in 8% of interview rounds, usually where the problem is deep enough that a nested subquery would stop being readable.

1999

Year CTEs entered ANSI SQL

8%

Interview rounds using CTEs

2018

MySQL finally added WITH

2009

Postgres 8.4 WITH support

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

What Is a CTE?

A CTE is a named temporary result set declared with the WITH keyword, scoped to the single statement that follows. The original 1999 spec called it a "query expression body with a with clause," which is exactly as readable as you'd expect from a committee document. The idea came from research on recursive query languages in the 1980s, and it shipped in DB2 first.

CTEs solve two problems. First, readability: instead of nesting subqueries three levels deep, you write each step as a named CTE and the final SELECT reads like a story. Second, reuse: if you need the same derived table in multiple places (e.g., in both a JOIN and a WHERE EXISTS), a CTE lets you define it once.

Every major SQL engine supports CTEs: PostgreSQL, MySQL 8.0+, SQL Server, Oracle, SQLite, BigQuery, Snowflake, Redshift, Trino, and Databricks. If you are writing SQL in 2024 or later, CTEs are always available.

Interview note: CTEs are the default tool for structuring interview answers. When an interviewer gives you a multi-step problem, start by sketching CTE names on the whiteboard before writing any SQL. This signals structured thinking.

CTE Syntax

The WITH keyword introduces one or more CTEs. Each CTE has a name, an optional column list, and a SELECT statement. The main query follows after the last CTE.

-- Single CTE
WITH active_users AS (
  SELECT user_id, name, email
  FROM users
  WHERE status = 'active'
    AND last_login >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
  au.name,
  au.email,
  COUNT(o.order_id) AS recent_orders
FROM active_users au
LEFT JOIN orders o ON au.user_id = o.user_id
  AND o.order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY au.name, au.email
ORDER BY recent_orders DESC;

Key Syntax Rules

One WITH keyword: Even with multiple CTEs, you write WITH only once. Separate CTEs with commas.

Column aliases (optional): You can specify column names after the CTE name: WITH totals (region, revenue) AS (...). If omitted, column names come from the inner SELECT.

Scope: A CTE is visible only to the statement that follows it. You cannot reference it in a separate query. It is not a view or a temp table.

Multiple references: You can reference the same CTE multiple times in the main query (e.g., in a self-join). The engine decides whether to materialize it or inline it.

-- CTE with explicit column aliases
WITH monthly_revenue (month, region, revenue) AS (
  SELECT
    DATE_TRUNC('month', order_date),
    region,
    SUM(amount)
  FROM orders
  GROUP BY 1, 2
)
SELECT month, region, revenue
FROM monthly_revenue
WHERE revenue > 50000
ORDER BY month, revenue DESC;

5 CTE Patterns for Interviews

These five patterns cover the range of CTE usage you will encounter in production code and interviews. Each one solves a different category of problem.

1. Basic CTE (Named Subquery)

The simplest CTE replaces a subquery with a named step. This does not change what the query computes. It changes how you read and maintain it. For one-off usage, a subquery is fine. The moment you nest two levels or reference the same derived table twice, a CTE wins.

-- Find departments where average salary
-- exceeds the company average
WITH company_avg AS (
  SELECT AVG(salary) AS avg_salary
  FROM employees
)
SELECT
  department,
  ROUND(AVG(salary), 0) AS dept_avg,
  (SELECT avg_salary FROM company_avg) AS company_avg
FROM employees
GROUP BY department
HAVING AVG(salary) > (SELECT avg_salary FROM company_avg);

Interview note: The company_avg CTE is referenced twice: once in SELECT and once in HAVING. Without a CTE, you would write the same subquery in both places. CTEs eliminate that duplication.

2. Recursive CTE

Recursive CTEs reference themselves. They have an anchor member (the base case) and a recursive member (the iteration step), connected by UNION ALL. The engine runs the anchor once, then repeatedly runs the recursive member until it produces zero new rows. This solves hierarchical problems: org charts, category trees, bill-of-materials explosions, and graph paths.

-- Walk an employee hierarchy from the CEO down
WITH RECURSIVE org_tree AS (
  -- Anchor: start at the CEO (no manager)
  SELECT id, name, manager_id, 1 AS depth
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive: find direct reports of current level
  SELECT e.id, e.name, e.manager_id, ot.depth + 1
  FROM employees e
  INNER JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT id, name, depth
FROM org_tree
ORDER BY depth, name;

Interview note: Always include a depth or iteration counter and mention the safety limit. PostgreSQL defaults to no limit. SQL Server defaults to 100 (OPTION (MAXRECURSION N) to change it). In interviews, say: “I would add a WHERE depth < 20 in the recursive member to prevent infinite loops from bad data.”

3. Chained CTEs (Multi-Step Pipeline)

Each CTE can reference any CTE defined above it. This creates a data pipeline inside a single query: filter, then aggregate, then rank, then output. Each step has a name that documents what it does. This is the pattern interviewers use to test whether you can decompose a problem into clean steps.

WITH monthly_orders AS (
  -- Step 1: aggregate orders by user and month
  SELECT
    user_id,
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS order_count,
    SUM(amount) AS total_spent
  FROM orders
  WHERE order_date >= '2024-01-01'
  GROUP BY user_id, DATE_TRUNC('month', order_date)
),
ranked AS (
  -- Step 2: rank users within each month
  SELECT *,
    RANK() OVER (
      PARTITION BY month ORDER BY total_spent DESC
    ) AS spend_rank
  FROM monthly_orders
),
top_spenders AS (
  -- Step 3: keep top 10 per month
  SELECT * FROM ranked WHERE spend_rank <= 10
)
-- Step 4: join back to users for names
SELECT
  u.name,
  ts.month,
  ts.order_count,
  ts.total_spent,
  ts.spend_rank
FROM top_spenders ts
INNER JOIN users u ON ts.user_id = u.id
ORDER BY ts.month, ts.spend_rank;

Interview note: Name your CTEs after what they compute, not after where they sit in the pipeline. Names like “step1” and “temp” make the interviewer do extra work to understand your query. Names like “monthly_orders” and “top_spenders” are self-documenting.

4. CTE vs Subquery

CTEs and subqueries often produce the same result. The choice is about readability, reuse, and (in some engines) performance. Here is the same query written both ways.

Subquery

SELECT department, avg_sal
FROM (
  SELECT department, AVG(salary) AS avg_sal
  FROM employees
  GROUP BY department
) dept_avgs
WHERE avg_sal > (
  SELECT AVG(salary) FROM employees
);

CTE

WITH dept_avgs AS (
  SELECT department, AVG(salary) AS avg_sal
  FROM employees
  GROUP BY department
),
company_avg AS (
  SELECT AVG(salary) AS avg_sal
  FROM employees
)
SELECT d.department, d.avg_sal
FROM dept_avgs d
WHERE d.avg_sal > (SELECT avg_sal FROM company_avg);

Use a subquery when: the derived table is used once, is short (under 5 lines), and nesting does not hurt readability.

Use a CTE when: the derived table is referenced multiple times, the query has more than one level of nesting, or the logic has distinct named steps.

5. CTE vs Temp Table

CTEs exist only for one statement. Temp tables persist for the session (or transaction). They solve different problems, but the choice confuses people. Here is when each one makes sense.

FeatureCTETemp Table
ScopeSingle statementSession or transaction
IndexableNoYes
MaterializedEngine decides (usually inlined)Always materialized to disk/memory
Multi-statement useNoYes
Best forQuery readability, single-query pipelinesMulti-query workflows, large intermediate results

Interview note: In an interview, default to CTEs. They keep everything in one query, which is easier for the interviewer to read. Mention temp tables only if the problem spans multiple queries or the intermediate result is reused across statements (e.g., in a stored procedure).

Recursive CTEs: Deep Dive

Recursive CTEs are the only way to walk hierarchical data in pure SQL without knowing the depth in advance. They replace what would otherwise require a loop in application code or a series of self-joins (one per level). The engine handles the iteration.

How Recursion Executes

Iteration 0 (Anchor): The anchor SELECT runs once. Its output becomes the initial working table.

Iteration 1+: The recursive SELECT runs against the working table from the previous iteration. New rows produced become the next working table. This repeats until the recursive SELECT produces zero rows.

Final result: UNION ALL combines all rows from all iterations into the final output.

Generating a Date Series

Not all recursive CTEs are about trees. A common pattern generates a series of dates to fill gaps in time-series data. This is useful when you need every day in a range, even days with no events.

-- Generate every date in January 2024
WITH RECURSIVE date_series AS (
  SELECT DATE '2024-01-01' AS dt

  UNION ALL

  SELECT dt + INTERVAL '1 day'
  FROM date_series
  WHERE dt < DATE '2024-01-31'
)
SELECT dt FROM date_series;

Path Accumulation

Recursive CTEs can build a path string as they traverse. This shows the full chain from root to leaf, which is useful for debugging hierarchy data and for display in UIs.

WITH RECURSIVE category_path AS (
  -- Anchor: top-level categories (no parent)
  SELECT
    id,
    name,
    name AS full_path,
    1 AS depth
  FROM categories
  WHERE parent_id IS NULL

  UNION ALL

  -- Recursive: append child name to parent path
  SELECT
    c.id,
    c.name,
    cp.full_path || ' > ' || c.name,
    cp.depth + 1
  FROM categories c
  INNER JOIN category_path cp ON c.parent_id = cp.id
)
SELECT full_path, depth
FROM category_path
ORDER BY full_path;

Preventing Infinite Loops

Bad data (circular references) causes recursive CTEs to run forever. Two safeguards exist. The first is a depth limit in the recursive member. The second is engine-level: SQL Server has MAXRECURSION (default 100), and PostgreSQL will run until memory is exhausted unless you add a guard.

-- Guard against cycles with a depth limit
WITH RECURSIVE org_tree AS (
  SELECT id, name, manager_id, 1 AS depth
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  SELECT e.id, e.name, e.manager_id, ot.depth + 1
  FROM employees e
  INNER JOIN org_tree ot ON e.manager_id = ot.id
  WHERE ot.depth < 50  -- safety limit
)
SELECT * FROM org_tree;

-- SQL Server: set explicit recursion limit
-- SELECT * FROM org_tree OPTION (MAXRECURSION 200);

Interview note: Mentioning cycle detection unprompted shows production experience. Say: “In production I would add a depth limit and optionally track visited IDs in an array to detect cycles explicitly.” PostgreSQL supports array accumulation for this: WHERE NOT e.id = ANY(visited_ids).

6 CTE Interview Questions

CTEs show up in every stage of the interview process. Phone screens test basic WITH syntax. Onsites test recursive traversal and multi-step pipelines.

Q1: Rewrite this nested subquery using a CTE.

What they test:

Basic CTE syntax and refactoring skills. The interviewer hands you a two-level nested query and watches you extract it into named steps. They care about your naming choices and whether the result is actually more readable.

Approach:

Start from the innermost subquery. Extract it as the first CTE. Name it after what it computes (e.g., dept_averages, not temp1). Extract the next level. The final SELECT should read like a sentence: “from dept_averages, where avg exceeds threshold.”

Q2: Find the top 3 products by revenue in each category.

What they test:

CTE chaining with window functions. The classic top-N-per-group pattern requires aggregation, ranking, and filtering. A chained CTE handles each step.

Approach:

First CTE: aggregate revenue by product and category. Second CTE: add ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC). Final SELECT: WHERE rn <= 3. Discuss RANK vs ROW_NUMBER if the interviewer asks about ties.

Q3: Write a recursive CTE to find all employees under a given manager, at any depth.

What they test:

Recursive CTE syntax and hierarchy traversal. This is the most common recursive CTE question. They want anchor + recursive member + UNION ALL, and they want you to include a depth counter.

Approach:

Anchor: SELECT the target manager. Recursive: JOIN employees on manager_id = previous level's id. Track depth. Mention the cycle-prevention guard (depth < N). State the termination condition: recursion stops when no new rows are produced.

Q4: What is the difference between a CTE and a temp table? When would you use each?

What they test:

Conceptual clarity about scope, materialization, and use cases. This is a knowledge question, not a coding question. The answer shows whether you understand what the engine actually does with each construct.

Approach:

CTEs: single-statement scope, not indexable, optimizer may inline. Temp tables: session scope, indexable, always materialized. Use CTEs for readability in one query. Use temp tables when you need the result across multiple statements, when you need to index it, or when forcing materialization improves a specific query plan.

Q5: Generate a running total of daily revenue using a CTE.

What they test:

CTE combined with window functions. The CTE aggregates to the day level; the window function computes the cumulative sum. This tests whether you can separate aggregation from running calculations.

Approach:

CTE: GROUP BY date with SUM(amount) AS daily_revenue. Main SELECT: SUM(daily_revenue) OVER (ORDER BY date) AS running_total. Mention that the window function's default frame (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) gives the correct cumulative sum.

Q6: A CTE in PostgreSQL 11 is slower than the equivalent subquery. Why?

What they test:

Engine-specific CTE behavior. Before PostgreSQL 12, CTEs were optimization fences: the engine always materialized them, preventing predicate pushdown and other optimizations. This is a trick question that separates candidates who have dealt with real performance issues from those who have only read syntax docs.

Approach:

Explain that PostgreSQL 11 and earlier materialized all CTEs. Filters in the main query could not push into the CTE. In PG 12+, the optimizer can inline CTEs (controlled by the MATERIALIZED / NOT MATERIALIZED hints). If stuck on PG 11, rewrite the CTE as a subquery to let the optimizer push predicates down.

3 CTE Mistakes That Fail Interviews

These show up in real interview debriefs. Each one is avoidable with a clear mental model of how CTEs work.

Writing WITH Before Every CTE

A common syntax error: writing WITH for each CTE instead of separating them with commas. Only the first CTE gets WITH. Every subsequent CTE is separated by a comma. This error is immediate feedback that you have not written many CTEs.

-- WRONG: multiple WITH keywords
WITH step1 AS (SELECT ...)
WITH step2 AS (SELECT ... FROM step1)  -- syntax error
SELECT * FROM step2;

-- CORRECT: one WITH, comma-separated
WITH step1 AS (
  SELECT ...
),
step2 AS (
  SELECT ... FROM step1
)
SELECT * FROM step2;

This is a fast rejection signal. Practice writing multi-CTE queries until the comma separator is automatic.

Missing UNION ALL in Recursive CTEs

The anchor and recursive members must be connected by UNION ALL. Using UNION (without ALL) works in some engines but deduplicates rows between iterations, which breaks hierarchy traversals where the same value appears at different depths. Worse, some engines reject UNION entirely in recursive CTEs.

-- WRONG: UNION deduplicates, may break recursion
WITH RECURSIVE tree AS (
  SELECT id, parent_id, 1 AS depth FROM nodes WHERE parent_id IS NULL
  UNION  -- should be UNION ALL
  SELECT n.id, n.parent_id, t.depth + 1
  FROM nodes n JOIN tree t ON n.parent_id = t.id
)
SELECT * FROM tree;

-- CORRECT: UNION ALL preserves all rows
WITH RECURSIVE tree AS (
  SELECT id, parent_id, 1 AS depth FROM nodes WHERE parent_id IS NULL
  UNION ALL
  SELECT n.id, n.parent_id, t.depth + 1
  FROM nodes n JOIN tree t ON n.parent_id = t.id
)
SELECT * FROM tree;

Always use UNION ALL in recursive CTEs. If you need deduplication, apply DISTINCT in the final SELECT, not between iterations.

Over-Using CTEs Where a Subquery Is Cleaner

Not every query benefits from a CTE. A single scalar comparison (WHERE amount > (SELECT AVG(amount) FROM orders)) is clearer as an inline subquery. Wrapping it in a CTE adds syntax overhead with no readability gain. Interviewers notice when you add structure without purpose.

-- Overkill: CTE for a simple scalar
WITH avg_order AS (
  SELECT AVG(amount) AS val FROM orders
)
SELECT * FROM orders
WHERE amount > (SELECT val FROM avg_order);

-- Simpler: inline subquery
SELECT * FROM orders
WHERE amount > (SELECT AVG(amount) FROM orders);

Use CTEs when they add clarity or enable reuse. For one-off scalar comparisons, keep the subquery inline. Show the interviewer you pick the right tool for each situation.

SQL CTE FAQ

What is a CTE in SQL?+
A CTE (Common Table Expression) is a named temporary result set defined with the WITH keyword. It exists only for the duration of a single SQL statement. You define it at the top of a query, give it a name, and reference it like a table in the main SELECT, INSERT, UPDATE, or DELETE that follows. CTEs make complex queries readable by breaking them into named steps.
Are CTEs faster than subqueries?+
In most modern engines (PostgreSQL 12+, BigQuery, Snowflake, Trino), the optimizer can inline a CTE, making it identical in performance to the equivalent subquery. Older PostgreSQL versions (before 12) treated CTEs as optimization fences, materializing the result even when inlining would be faster. In those versions, CTEs could be slower. For current engines, choose based on readability. If performance matters, check the query plan with EXPLAIN ANALYZE.
Can you use multiple CTEs in one query?+
Yes. Separate them with commas after a single WITH keyword. Each CTE can reference any CTE defined before it in the list. This is called CTE chaining. Example: WITH step1 AS (...), step2 AS (SELECT ... FROM step1), step3 AS (SELECT ... FROM step2) SELECT * FROM step3. There is no practical limit to how many you can chain.
What is a recursive CTE?+
A recursive CTE references itself in its own definition. It has two parts: an anchor member (the base case, runs once) and a recursive member (references the CTE name, runs repeatedly until it produces no new rows). Recursive CTEs solve hierarchical problems: org charts, bill-of-materials, tree traversals, and graph walks. You must include UNION ALL between the anchor and recursive members.
Do CTEs work in SQL Server, MySQL, and PostgreSQL?+
Yes. CTEs (including recursive CTEs) are supported in PostgreSQL 8.4+, SQL Server 2005+, MySQL 8.0+, SQLite 3.8.3+, Oracle 9i+, BigQuery, Snowflake, Redshift, Trino, and Databricks SQL. The only major engine that does not support them is MySQL 5.7 and earlier. If your production environment runs MySQL 8.0 or newer, CTEs work.

The WITH Clause Is 26 Years Old. Use It Like You Mean It.

Every recursive query you see in a Leetcode-style round traces back to the 1999 spec. Learn the decomposition, then practice until a three-stage CTE feels like prose.