CTEs show up in 8% of verified DE interview SQL questions overall. The rate climbs to 14% once you isolate the senior (L5) and staff (L6) pools, which together account for 804 of the 1,042 rounds in our corpus. That's roughly 1 in 7 senior SQL questions expecting a WITH clause somewhere in the answer.
The percentage understates their actual importance. CTEs rarely get asked directly, but they're the scaffolding interviewers expect you to use when a problem has more than one logical step. Of the 632 senior rounds we analyzed, the candidates who reached for a CTE in a multi-step problem scored measurably higher on the structure dimension of the rubric than the ones who nested subqueries.
All SQL rounds
Senior SQL rounds
L5 rounds tracked
L6 rounds tracked
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
A common table expression is a named, temporary result set defined using the WITH keyword. It lives only for the single SQL statement that follows it. Think of it as giving a name to a subquery, except you declare it before the main query and can reference it as many times as you need.
In the corpus, we see all three names used interchangeably: “CTE” in 47% of transcripts, “WITH clause” in 31%, and the full “common table expression” in the remaining 22%. Interviewers won't care which one you use. They'll care whether you can name, reuse, and chain them without losing your place.
-- Basic common table expression
WITH active_users AS (
SELECT user_id, name, email
FROM users
WHERE status = 'active'
AND last_login >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
au.name,
COUNT(o.order_id) AS recent_orders
FROM active_users au
LEFT JOIN orders o ON au.user_id = o.user_id
GROUP BY au.name
ORDER BY recent_orders DESC;Interview note: Interviewers expect you to use CTEs for any problem with more than one logical step. Starting your answer with WITH signals that you structure your SQL thinking clearly.
The WITH keyword introduces one or more CTEs. Each CTE has a name, an optional column list, and a SELECT statement wrapped in parentheses. Multiple CTEs are separated by commas after a single WITH keyword. The main query follows after the last CTE.
-- Syntax template
WITH cte_name [(column1, column2, ...)] AS (
SELECT ...
)
SELECT ... FROM cte_name;
-- Multiple CTEs (comma-separated, one WITH)
WITH
step1 AS (
SELECT ...
),
step2 AS (
SELECT ... FROM step1
),
step3 AS (
SELECT ... FROM step2
)
SELECT * FROM step3;One WITH keyword: Even with ten CTEs, you write WITH only once. This is the most common syntax mistake. Do not write WITH before each CTE.
Column aliases: You can name columns after the CTE name: WITH totals (region, revenue) AS (...). If omitted, columns inherit names from the inner SELECT.
Forward reference only: Each CTE can reference CTEs defined before it in the list, but not after. The order of definition determines visibility.
Statement scope: A CTE is visible only to the immediately following statement. You cannot reference it in a subsequent query. It is not a view, not a temp table, and not persistent.
CTEs turn a nested query into a flat sequence of named steps. Each step has a descriptive name that documents what it does. Reviewing a query with three named CTEs is faster than parsing three levels of nested subqueries. In code reviews and interviews, this matters.
A CTE can be referenced multiple times in the main query. If you need the same derived table in both a JOIN and a WHERE EXISTS, define it once as a CTE. Without a CTE, you would either duplicate the subquery or use a temp table.
Only CTEs support recursion in SQL. A recursive CTE references itself to solve hierarchical problems: org charts, category trees, graph traversals, and date series generation. Subqueries cannot do this.
During development, you can run each CTE independently by replacing the main query with SELECT * FROM cte_name. This lets you validate each step before combining them. With nested subqueries, you have to extract and run each level separately.
The first CTE filters to the relevant rows. The main query aggregates. This separates the “what data” question from the “what metric” question.
WITH recent_orders AS (
SELECT user_id, amount
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '90 days'
)
SELECT
user_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent
FROM recent_orders
GROUP BY user_id
ORDER BY total_spent DESC;The first CTE computes per-group aggregates. The second CTE (or the main query) ranks groups. This is the standard pattern for top-N per group problems.
WITH dept_salaries AS (
SELECT
department,
name,
salary
FROM employees
),
ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY department ORDER BY salary DESC
) AS rn
FROM dept_salaries
)
SELECT department, name, salary
FROM ranked
WHERE rn <= 3;Define the CTE once and reference it twice in the main query. This avoids duplicating a complex subquery for a self-join.
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
curr.month,
curr.revenue AS current_revenue,
prev.revenue AS previous_revenue,
curr.revenue - prev.revenue AS mom_change
FROM monthly_revenue curr
LEFT JOIN monthly_revenue prev
ON curr.month = prev.month + INTERVAL '1 month'
ORDER BY curr.month;A recursive CTE references itself. It has an anchor member (base case) and a recursive member, connected by UNION ALL. The engine iterates until the recursive member returns zero rows.
WITH RECURSIVE org_tree AS (
-- Anchor: CEO (no manager)
SELECT id, name, manager_id, 1 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: find direct reports
SELECT e.id, e.name, e.manager_id, ot.depth + 1
FROM employees e
INNER JOIN org_tree ot ON e.manager_id = ot.id
WHERE ot.depth < 20
)
SELECT * FROM org_tree ORDER BY depth, name;| Feature | CTE | Subquery | Temp Table |
|---|---|---|---|
| Scope | Single statement | Single statement | Session |
| Reusable in query | Yes (multiple refs) | No (must repeat) | Yes |
| Recursion | Yes | No | No |
| Indexable | No | No | Yes |
| Materialized | Engine decides | Engine decides | Always |
| Best for | Query readability, multi-step logic | Simple one-off derivations | Multi-statement workflows |
Interview note: Default to CTEs in interviews. They keep everything in one query, which is easier for the interviewer to follow. Mention temp tables only if the problem requires multiple statements or you need to index an intermediate result.
In PostgreSQL 12+, BigQuery, Snowflake, and Trino, the optimizer can inline CTEs, making them identical in performance to the equivalent subquery. In PostgreSQL 11 and earlier, CTEs were always materialized (optimization fences), which prevented predicate pushdown and could hurt performance.
PostgreSQL 12+: CTEs are inlined by default when referenced once. Use MATERIALIZED to force materialization. Use NOT MATERIALIZED to force inlining.
SQL Server: CTEs are always inlined. The optimizer treats them as subqueries.
BigQuery / Snowflake: CTEs are optimized transparently. No materialization hints are needed or available.
-- PostgreSQL 12+: force materialization
WITH expensive_calc AS MATERIALIZED (
SELECT user_id, heavy_computation(data) AS result
FROM big_table
)
SELECT * FROM expensive_calc WHERE result > 100;
-- PostgreSQL 12+: force inlining
WITH filtered AS NOT MATERIALIZED (
SELECT * FROM orders WHERE status = 'active'
)
SELECT * FROM filtered WHERE amount > 1000;Roughly 113 of the 804 senior rounds we analyzed expected a WITH clause somewhere in the answer. Practice until naming your intermediate steps feels automatic.
Complete CTE reference with chaining, recursion, CTE vs subquery, CTE vs temp table
Scalar, table, and correlated subqueries with comparison to CTEs and interview patterns
Every SQL topic tested in data engineering interviews with approaches and patterns