SQL Subquery: Types, Examples, and Interview Patterns
Every non-trivial pipeline query you've ever shipped is a tree. Subqueries are how you build that tree inline. In our corpus they appear in 19% of verified rounds and CTEs in 8%, which tells you most interviewers still reach for the older construct when the problem gets layered.
What Is a SQL Subquery?
A subquery is a nested SELECT that feeds a parent query. In a query plan it shows up as a subplan node, materialized once or executed per row depending on whether it's correlated. Three shapes map to every use case you'll see in a pipeline or an interview whiteboard.
Subquery Types with Examples
Each type solves a different class of problem. Interviewers expect you to pick the right one without prompting.
Know Subqueries the way the interviewer who asks it knows it.
Highest Throughput Pipelines
The pipes that carry the most water.
Pulled from debriefs where SQL was the gate.
The Three Subquery Types
Scalar Subquery
Returns exactly one row and one column. Used wherever a single value is expected: in the SELECT list, in a WHERE comparison, or as a default value. If the subquery returns more than one row, the engine throws an error.
Table Subquery
Returns multiple rows (and optionally multiple columns). Used with IN, EXISTS, ANY, ALL, or in the FROM clause as a derived table. This is the most common category in interviews.
Correlated Subquery
References a column from the outer query, so it re-executes for every row. More powerful than non-correlated subqueries, but potentially slower. Required for row-by-row comparisons like "find employees earning above their department average."
Correlated Subqueries: When They Are the Only Option
A correlated subquery depends on values from the outer query. The database cannot run it once and cache the result. It re-evaluates the inner SELECT for every row the outer query produces. This has real performance consequences, but there are problems that genuinely require it.
For a table with 10,000 rows, a correlated subquery runs the inner SELECT up to 10,000 times. The optimizer may cache results for repeated input values, but you should not count on it. Measure with EXPLAIN ANALYZE.
Scalar Subquery: Compare Against an Aggregate
SELECT name, salary, department_id
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
);Interview note: This is the most common subquery question at phone-screen level. The interviewer checks whether you know to use a scalar subquery vs. a self-join. Both work, but the subquery is 3 lines shorter.
Table Subquery with IN
SELECT customer_id, name, email
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
);Interview note: If the interviewer asks you to rewrite this without IN, use EXISTS. Both produce identical execution plans in PostgreSQL 14+ and Snowflake. The choice is style, not performance.
Table Subquery with EXISTS
SELECT d.department_name
FROM departments d
WHERE EXISTS (
SELECT 1
FROM employees e
WHERE e.department_id = d.id
AND e.hire_date >= '2024-01-01'
);Interview note: "SELECT 1" inside EXISTS is convention. The database ignores the SELECT list entirely. You could write SELECT * or SELECT 42. It does not affect correctness or performance.
Derived Table (Subquery in FROM)
SELECT dept_name, avg_salary
FROM (
SELECT
d.department_name AS dept_name,
AVG(e.salary) AS avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.department_name
) AS dept_averages
WHERE avg_salary > 85000;Interview note: Most interviewers prefer the CTE version of this pattern. But some older codebases and Redshift workloads still use derived tables heavily. Know both.
Correlated Subquery
SELECT e.name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e.department_id
);Interview note: This is a top-5 SQL interview question. The faster alternative uses a CTE or window function (AVG() OVER (PARTITION BY department_id)), but interviewers sometimes specifically ask for the correlated version to test whether you understand row-by-row execution.
Subquery vs CTE vs JOIN
Interviewers frequently ask when to pick one over the other. The answer depends on readability, reuse, and what columns you need in the output.
| Criteria | Subquery | CTE | JOIN |
|---|---|---|---|
| Readability at 1 level | Good | Good | Good |
| Readability at 3+ levels | Poor (deep nesting) | Best (named steps) | Moderate |
| Reuse same result twice | Must duplicate | Reference by name | Must duplicate |
| Columns in output | Outer table only | Any CTE column | Both tables |
| Existence checks | EXISTS / NOT EXISTS | Possible but verbose | LEFT JOIN + IS NULL |
| Recursion | Not supported | WITH RECURSIVE | Not supported |
Same Problem, Three Approaches
-- Find departments where the average salary exceeds 90,000.
-- Subquery (derived table)
SELECT dept_name, avg_sal
FROM (
SELECT department_name AS dept_name, AVG(salary) AS avg_sal
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY department_name
) sub
WHERE avg_sal > 90000;
-- CTE
WITH dept_avg AS (
SELECT department_name, AVG(salary) AS avg_sal
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY department_name
)
SELECT department_name, avg_sal
FROM dept_avg
WHERE avg_sal > 90000;
-- JOIN with HAVING
SELECT d.department_name, AVG(e.salary) AS avg_sal
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.department_name
HAVING AVG(e.salary) > 90000;Interview note: All three produce identical results. The HAVING approach is shortest for this specific case. But the moment you need a second filter on the aggregated data (e.g., rank departments), you will need the CTE or subquery approach. Show the interviewer you know the tradeoffs.
Correlated Subquery: Latest Order Per Customer
-- For each order, find the most recent order from the same customer
SELECT o.order_id, o.customer_id, o.order_date, o.total
FROM orders o
WHERE o.order_date = (
SELECT MAX(o2.order_date)
FROM orders o2
WHERE o2.customer_id = o.customer_id
);Performance: When to Worry
Small tables (under 10K rows)
Correlated subqueries run fine. The optimizer handles them without issues. Do not over-optimize in an interview.
Medium tables (10K to 1M rows)
Performance depends on indexing. If the correlated column (e.g., customer_id) is indexed, the inner query uses an index seek for each iteration. Without an index, you get a full scan per row.
Large tables (1M+ rows)
Rewrite to a window function or a CTE with a join. The correlated approach becomes impractical. In interviews, mentioning this threshold shows production awareness.
The Window Function Alternative
WITH ranked AS (
SELECT
order_id, customer_id, order_date, total,
ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY order_date DESC
) AS rn
FROM orders
)
SELECT order_id, customer_id, order_date, total
FROM ranked
WHERE rn = 1;Interview note: If the interviewer asks for a correlated subquery, write it. Then say: For tables over a few hundred thousand rows, I would use ROW_NUMBER with a CTE instead. This demonstrates you know both patterns and understand the performance boundary.
6 Subquery Interview Questions
These cover the range from phone screen to onsite. Each tests a different aspect of subquery knowledge.
Q1: Find all employees earning above the company average salary.
What they test: Basic scalar subquery syntax. Can you write a WHERE clause that compares against an aggregated value? Approach: WHERE salary > (SELECT AVG(salary) FROM employees). One scalar subquery. Do not overthink it. State that the subquery runs once and returns a single number.
Q2: Find customers who have never placed an order.
What they test: Anti-join pattern. Three valid approaches exist: NOT IN, NOT EXISTS, and LEFT JOIN with IS NULL. The interviewer wants to see if you know the NULL trap with NOT IN. Approach: Use NOT EXISTS (SELECT 1 FROM orders WHERE orders.customer_id = customers.id). Mention that NOT IN breaks if the subquery returns any NULLs. This detail alone sets you apart from most candidates.
Q3: For each department, find employees earning above that department's average.
What they test: Correlated subquery vs. window function. The interviewer is checking whether you can write the correlated version AND whether you know the window function alternative. Approach: Correlated: WHERE e.salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id). Then offer the window function rewrite using AVG() OVER (PARTITION BY department_id).
Q4: Find the second highest salary in each department.
What they test: Nested or correlated subquery for ranking. This is a classic Nth highest problem. Approach: Use DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) in a CTE, then filter WHERE rank = 2. If forced to use subqueries only, count distinct higher salaries: WHERE (SELECT COUNT(DISTINCT salary) FROM employees e2 WHERE e2.department_id = e.department_id AND e2.salary > e.salary) = 1.
Q5: Rewrite this nested subquery as a CTE. Which do you prefer and why?
What they test: Code refactoring judgment. The interviewer hands you a deeply nested query and watches how you decompose it. Approach: Extract each subquery into a named CTE, starting from the innermost. Name each CTE after what it computes (e.g., daily_totals, ranked_products). State that CTEs win for readability and reuse, but for a one-off scalar comparison, a subquery is fine.
Q6: Explain when NOT EXISTS is better than NOT IN.
What they test: NULL awareness and defensive SQL. Approach: If the subquery in NOT IN returns even a single NULL, the entire NOT IN predicate evaluates to UNKNOWN, and zero rows come back. NOT EXISTS handles NULLs correctly because it checks for row existence, not value equality. Always default to NOT EXISTS unless you are certain the inner column has no NULLs.
3 Subquery Mistakes That Fail Interviews
These are not obscure edge cases. They appear in real interview debriefs as reasons candidates get rejected.
NOT IN with NULLs
If the subquery in a NOT IN clause returns any NULL value, the entire predicate yields no rows. This happens because SQL uses three-valued logic: comparing a value to NULL produces UNKNOWN, and NOT IN requires every comparison to be FALSE. One NULL makes the whole thing UNKNOWN. Fix: use NOT EXISTS, or add WHERE ... IS NOT NULL inside the subquery.
Correlated Subquery on a Large Table Without an Index
A correlated subquery re-executes for every row of the outer query. On a 500K-row table, that means 500K inner queries. If the correlated column is not indexed, each inner query does a full table scan. Fix: rewrite to a JOIN and GROUP BY, or use a window function.
Subquery Returning Multiple Rows in a Scalar Context
A scalar subquery must return exactly one row. If it returns two or more, the database throws an error. This often happens when you forget a GROUP BY or when the data has unexpected duplicates. The query works in development (where the test data happens to have one match) and breaks in production. Fix: add LIMIT 1 with an ORDER BY, or use MAX/MIN to force a scalar.
SQL Subquery FAQ
What is a subquery in SQL?+
When should I use a subquery instead of a JOIN?+
Are subqueries slower than JOINs?+
What is a correlated subquery?+
Can I nest subqueries more than one level deep?+
Build the Query Tree on Paper First
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition