CTE Practice Problems for Data Engineers
Subqueries and CTEs appear in ~4.9% of SQL interview questions, but they disproportionately show up in onsite rounds where difficulty is higher. Five problems from basic WITH clauses to recursive traversals.
What this guide covers
Subqueries and CTEs appear in ~4.9% of SQL interview questions, but they disproportionately show up in onsite rounds where difficulty is higher. Five progressive problems cover the pattern range: basic WITH for readability, chained CTEs for multi-step logic, CTE + window combos for dedup, recursive CTEs for hierarchies and date spines.
The CTE problem on the whiteboard, not in the docs
Interviewers don't ask you to define WITH. They ask you to build a three-step pipeline where the intermediate result is referenced twice, and then they ask you to make it recursive. The 5 problems that meet the bar are the patterns onsite panels actually use.
Know CTE the way the interviewer who asks it knows it.
CTE or subquery? The choice that defines your query's readability.
Reach for a CTE when the same intermediate result is referenced more than once, when nesting exceeds two levels, or when the next reviewer has to read the query in real time. Reach for a subquery when one of those isn't true. The 5 problems on this page each force the tradeoff so the choice stops being a guess.
Five progressive CTE problems
Ordered from warm-up to advanced. Each builds on a different CTE skill that interviewers test.
1. Filter before joining
Write a CTE that pre-filters an orders table to the last 90 days, then joins it to a customers table. Tests whether you understand CTEs as readable query steps. The naive approach filters in WHERE after the join. The CTE filters first, making the join smaller and the logic clearer. Skills: basic CTE syntax, readability, filter pushdown.
2. Multi-step aggregation
Chain two CTEs: the first aggregates daily revenue per product, the second ranks products by total revenue. The final SELECT pulls the top 5. The bread-and-butter CTE pattern. Each CTE does one thing, and the result reads top-to-bottom like a recipe. Skills: chained CTEs, aggregation, ranking.
3. CTE for deduplication
Given a table with duplicate user events (same user_id, same event_type, same timestamp), use a CTE with ROW_NUMBER to keep only the first occurrence per user per event type. Then aggregate the deduplicated result. Appears constantly in real pipelines where source data has duplicates. Skills: ROW_NUMBER inside CTE, dedup, combining CTEs with window functions.
4. Recursive org chart traversal
employees table with manager_id. Write a recursive CTE that builds the full reporting chain from CEO to every employee. Include a depth column. The classic recursive CTE problem. Anchor member selects the root (WHERE manager_id IS NULL). Recursive member joins back to the CTE itself. Skills: recursive CTE syntax, anchor/recursive members, tree traversal.
5. Recursive date spine
Generate a continuous date series from 2024-01-01 to 2024-12-31 using a recursive CTE, then LEFT JOIN daily revenue data onto it. Days with no revenue show 0. Solves the 'missing dates' problem that trips up many candidates. Without a date spine, gaps in your data become invisible. Skills: recursive date generation, LEFT JOIN to fill gaps, COALESCE for defaults.
When CTEs show up in interviews
CTEs are a tool interviewers expect you to reach for when a query gets complex. They concentrate in later rounds where the bar is higher.
Refactor a nested subquery into CTEs
Interviewers hand you a 40-line query with 3 levels of nested subqueries and ask you to rewrite it. The test isn't whether you get the same result; it's whether you can decompose logic into named, readable steps.
CTE + window function combo
First CTE aggregates raw data. Second CTE applies a window function (usually ROW_NUMBER or LAG). Final SELECT filters. This three-layer pattern handles ~70% of analytical interview questions.
Recursive hierarchy traversal
Org charts, category trees, bill-of-materials. The interviewer wants to see that you know the anchor/recursive member structure and can add a depth or path column.
CTE for testability
Some interviewers ask you to write a query, then change the requirements. CTEs make modifications easy because each step is isolated. Changing one CTE does not break the others.
Common CTE mistakes
These errors come up in interviews more often than you'd think.
Referencing a CTE before it's defined
CTEs are evaluated in order. If cte_b references cte_a, then cte_a must come first in the WITH clause. Seems obvious, but under interview pressure people get the order wrong.
Forgetting the comma between CTEs
The WITH keyword appears once. Each CTE after the first is separated by a comma. A common syntax error: writing WITH again for the second CTE.
Using a CTE when a subquery is simpler
If you only reference the result once and it's a single aggregation, a subquery in the FROM clause is fine. CTEs add value when you reference the same intermediate result multiple times or need readability across many steps.
Infinite recursion in recursive CTEs
Every recursive CTE needs a termination condition. If your recursive member doesn't converge (no WHERE clause limiting depth, no rows being eliminated), the query runs forever. Most engines have a default recursion limit, but relying on it is not a plan.
Worked example: recursive org chart traversal
WITH RECURSIVE org_tree AS (
-- Anchor: start from the CEO (no manager)
SELECT employee_id, emp_name, manager_id, 0 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: join each employee to their manager in the tree
SELECT e.employee_id, e.emp_name, e.manager_id, t.depth + 1
FROM employees e
JOIN org_tree t ON e.manager_id = t.employee_id
)
SELECT depth, emp_name
FROM org_tree
ORDER BY depth, emp_name;The anchor member seeds the recursion with root nodes (employees whose manager_id IS NULL). Each recursive step joins one level deeper. The depth column increments by 1 at each level, so you can see exactly where each person sits in the hierarchy.
CTE practice FAQ
What is a CTE in SQL?+
When should I use a CTE instead of a subquery?+
Are recursive CTEs commonly asked in interviews?+
Do CTEs affect query performance?+
Build CTE fluency with real SQL
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition