SQL Subquery: Types, Examples, and Interview Patterns

Every non-trivial pipeline query you've ever shipped is a tree. Subqueries are how you build that tree inline. In our corpus they appear in 19% of verified rounds and CTEs in 8%, which tells you most interviewers still reach for the older construct when the problem gets layered.

19%

Questions using subqueries

CTE frequency

122

Onsite SQL rounds analyzed

Subquery shapes to know cold

What Is a SQL Subquery?

A subquery is a nested SELECT that feeds a parent query. In a query plan it shows up as a subplan node, materialized once or executed per row depending on whether it's correlated. Three shapes map to every use case you'll see in a pipeline or an interview whiteboard.

The Three Subquery Types

Scalar Subquery

Returns exactly one row and one column. Used wherever a single value is expected: in the SELECT list, in a WHERE comparison, or as a default value. If the subquery returns more than one row, the engine throws an error.

Table Subquery

Returns multiple rows (and optionally multiple columns). Used with IN, EXISTS, ANY, ALL, or in the FROM clause as a derived table. This is the most common category in interviews.

Correlated Subquery

References a column from the outer query, so it re-executes for every row. More powerful than non-correlated subqueries, but potentially slower. Required for row-by-row comparisons like "find employees earning above their department average."

Subquery Types with Examples

Each type solves a different class of problem. Interviewers expect you to pick the right one without prompting.

Scalar Subquery: Compare Against an Aggregate

SELECT emp_name, salary, department
FROM employees
WHERE salary > (
  SELECT AVG(salary)
  FROM employees
);

Interview note: This is the most common subquery question at phone-screen level. The interviewer checks whether you know to use a scalar subquery vs. a self-join. Both work, but the subquery is 3 lines shorter.

Table Subquery with IN

SELECT user_id, username, email
FROM users
WHERE user_id IN (
  SELECT DISTINCT user_id
  FROM transactions
  WHERE transaction_date >= '2024-01-01'
);

Interview note: If the interviewer asks you to rewrite this without IN, use EXISTS. Both produce identical execution plans in PostgreSQL 14+ and Snowflake. The choice is style, not performance.

Table Subquery with EXISTS

SELECT DISTINCT m.department
FROM employee_metrics m
WHERE EXISTS (
  SELECT 1
  FROM employees e
  WHERE e.department = m.department
    AND e.hire_date >= '2024-01-01'
);

Interview note: "SELECT 1" inside EXISTS is convention. The database ignores the SELECT list entirely. You could write SELECT * or SELECT 42. It does not affect correctness or performance.

Derived Table (Subquery in FROM)

SELECT dept_name, avg_salary
FROM (
  SELECT
    e.department AS dept_name,
    AVG(e.salary) AS avg_salary
  FROM employees e
  GROUP BY e.department
) AS dept_averages
WHERE avg_salary > 85000;

Interview note: Most interviewers prefer the CTE version of this pattern. But some older codebases and Redshift workloads still use derived tables heavily. Know both.

Correlated Subquery

SELECT e.emp_name, e.salary, e.department
FROM employees e
WHERE e.salary > (
  SELECT AVG(e2.salary)
  FROM employees e2
  WHERE e2.department = e.department
);

Interview note: This is a top-5 SQL interview question. The faster alternative uses a CTE or window function (AVG() OVER (PARTITION BY department_id)), but interviewers sometimes specifically ask for the correlated version to test whether you understand row-by-row execution.

Subquery vs CTE vs JOIN

Interviewers frequently ask when to pick one over the other. The answer depends on readability, reuse, and what columns you need in the output.

Criteria	Subquery	CTE	JOIN
Readability at 1 level	Good	Good	Good
Readability at 3+ levels	Poor (deep nesting)	Best (named steps)	Moderate
Reuse same result twice	Must duplicate	Reference by name	Must duplicate
Columns in output	Outer table only	Any CTE column	Both tables
Existence checks	EXISTS / NOT EXISTS	Possible but verbose	LEFT JOIN + IS NULL
Recursion	Not supported	WITH RECURSIVE	Not supported

Same Problem, Three Approaches

-- Find departments where the average salary exceeds 90,000.

-- Subquery (derived table)
SELECT dept_name, avg_sal
FROM (
  SELECT department AS dept_name, AVG(salary) AS avg_sal
  FROM employees
  GROUP BY department
) sub
WHERE avg_sal > 90000;

-- CTE
WITH dept_avg AS (
  SELECT department, AVG(salary) AS avg_sal
  FROM employees
  GROUP BY department
)
SELECT department, avg_sal
FROM dept_avg
WHERE avg_sal > 90000;

-- JOIN with HAVING
SELECT e.department, AVG(e.salary) AS avg_sal
FROM employees e
GROUP BY e.department
HAVING AVG(e.salary) > 90000;

Interview note: All three produce identical results. The HAVING approach is shortest for this specific case. But the moment you need a second filter on the aggregated data (e.g., rank departments), you will need the CTE or subquery approach. Show the interviewer you know the tradeoffs.

Correlated Subqueries: When They Are the Only Option

A correlated subquery depends on values from the outer query. The database cannot run it once and cache the result. It re-evaluates the inner SELECT for every row the outer query produces. This has real performance consequences, but there are problems that genuinely require it.

For a table with 10,000 rows, a correlated subquery runs the inner SELECT up to 10,000 times. The optimizer may cache results for repeated input values, but you should not count on it. Measure with EXPLAIN ANALYZE.

Correlated Subquery: Latest Order Per Customer

-- For each transaction, find the most recent transaction from the same user
SELECT t.transaction_id, t.user_id, t.transaction_date, t.total_amount
FROM transactions t
WHERE t.transaction_date = (
  SELECT MAX(t2.transaction_date)
  FROM transactions t2
  WHERE t2.user_id = t.user_id
);

Performance: When to Worry

Small tables (under 10K rows)

Correlated subqueries run fine. The optimizer handles them without issues. Do not over-optimize in an interview.

Medium tables (10K to 1M rows)

Performance depends on indexing. If the correlated column (e.g., customer_id) is indexed, the inner query uses an index seek for each iteration. Without an index, you get a full scan per row.

Large tables (1M+ rows)

Rewrite to a window function or a CTE with a join. The correlated approach becomes impractical. In interviews, mentioning this threshold shows production awareness.

The Window Function Alternative

WITH ranked AS (
  SELECT
    transaction_id, user_id, transaction_date, total_amount,
    ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY transaction_date DESC
    ) AS rn
  FROM transactions
)
SELECT transaction_id, user_id, transaction_date, total_amount
FROM ranked
WHERE rn = 1;

Interview note: If the interviewer asks for a correlated subquery, write it. Then say: For tables over a few hundred thousand rows, I would use ROW_NUMBER with a CTE instead. This demonstrates you know both patterns and understand the performance boundary.

6 Subquery Interview Questions

These cover the range from phone screen to onsite. Each tests a different aspect of subquery knowledge.

Q1: Find all employees earning above the company average salary.

What they test: Basic scalar subquery syntax. Can you write a WHERE clause that compares against an aggregated value? Approach: WHERE salary > (SELECT AVG(salary) FROM employees). One scalar subquery. Do not overthink it. State that the subquery runs once and returns a single number.

Q2: Find customers who have never placed an order.

What they test: Anti-join pattern. Three valid approaches exist: NOT IN, NOT EXISTS, and LEFT JOIN with IS NULL. The interviewer wants to see if you know the NULL trap with NOT IN. Approach: Use NOT EXISTS (SELECT 1 FROM orders WHERE orders.customer_id = customers.id). Mention that NOT IN breaks if the subquery returns any NULLs. This detail alone sets you apart from most candidates.

Q3: For each department, find employees earning above that department's average.

What they test: Correlated subquery vs. window function. The interviewer is checking whether you can write the correlated version AND whether you know the window function alternative. Approach: Correlated: WHERE e.salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id). Then offer the window function rewrite using AVG() OVER (PARTITION BY department_id).

Q4: Find the second highest salary in each department.

What they test: Nested or correlated subquery for ranking. This is a classic Nth highest problem. Approach: Use DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) in a CTE, then filter WHERE rank = 2. If forced to use subqueries only, count distinct higher salaries: WHERE (SELECT COUNT(DISTINCT salary) FROM employees e2 WHERE e2.department_id = e.department_id AND e2.salary > e.salary) = 1.

Q5: Rewrite this nested subquery as a CTE. Which do you prefer and why?

What they test: Code refactoring judgment. The interviewer hands you a deeply nested query and watches how you decompose it. Approach: Extract each subquery into a named CTE, starting from the innermost. Name each CTE after what it computes (e.g., daily_totals, ranked_products). State that CTEs win for readability and reuse, but for a one-off scalar comparison, a subquery is fine.

Q6: Explain when NOT EXISTS is better than NOT IN.

What they test: NULL awareness and defensive SQL. Approach: If the subquery in NOT IN returns even a single NULL, the entire NOT IN predicate evaluates to UNKNOWN, and zero rows come back. NOT EXISTS handles NULLs correctly because it checks for row existence, not value equality. Always default to NOT EXISTS unless you are certain the inner column has no NULLs.

3 Subquery Mistakes That Fail Interviews

These are not obscure edge cases. They appear in real interview debriefs as reasons candidates get rejected.

NOT IN with NULLs

If the subquery in a NOT IN clause returns any NULL value, the entire predicate yields no rows. This happens because SQL uses three-valued logic: comparing a value to NULL produces UNKNOWN, and NOT IN requires every comparison to be FALSE. One NULL makes the whole thing UNKNOWN. Fix: use NOT EXISTS, or add WHERE ... IS NOT NULL inside the subquery.

Correlated Subquery on a Large Table Without an Index

A correlated subquery re-executes for every row of the outer query. On a 500K-row table, that means 500K inner queries. If the correlated column is not indexed, each inner query does a full table scan. Fix: rewrite to a JOIN and GROUP BY, or use a window function.

Subquery Returning Multiple Rows in a Scalar Context

A scalar subquery must return exactly one row. If it returns two or more, the database throws an error. This often happens when you forget a GROUP BY or when the data has unexpected duplicates. The query works in development (where the test data happens to have one match) and breaks in production. Fix: add LIMIT 1 with an ORDER BY, or use MAX/MIN to force a scalar.

SQL Subquery FAQ

What is a subquery in SQL?+

A subquery is a SELECT statement nested inside another SQL statement. It can appear in the WHERE clause, FROM clause, SELECT list, or HAVING clause. The database executes the inner query first (for non-correlated subqueries) and uses its result in the outer query. Subqueries let you break a problem into steps without creating temporary tables.

When should I use a subquery instead of a JOIN?+

Use a subquery when you need a single aggregated value for filtering (e.g., WHERE salary > (SELECT AVG(salary) FROM employees)), when the inner query is logically independent, or when you need EXISTS/NOT EXISTS checks. Use a JOIN when you need columns from both tables in the output. In most engines, the optimizer rewrites simple subqueries into joins anyway, so the choice is often about readability.

Are subqueries slower than JOINs?+

Not necessarily. Modern query optimizers in PostgreSQL, BigQuery, Snowflake, and Trino can flatten subqueries into joins. Correlated subqueries are the exception: they re-execute for each row of the outer query, which can be slow on large tables. Non-correlated subqueries run once and are typically as fast as the equivalent JOIN.

What is a correlated subquery?+

A correlated subquery references a column from the outer query, so it cannot run independently. The database re-evaluates it for every row the outer query processes. Example: SELECT * FROM employees e WHERE salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id). The inner query depends on e.department_id from the outer query.

Can I nest subqueries more than one level deep?+

Yes, SQL allows arbitrary nesting depth. But readability drops fast. Two levels of nesting is the practical limit before you should refactor into CTEs. In interviews, if you find yourself nesting three levels deep, stop and rewrite with WITH clauses. The interviewer is watching how you manage complexity, not how deeply you can nest.

02 / Why practice

Build the Query Tree on Paper First

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Start Practicing

Related Guides

Subquery Practice Problems→

5 hands-on subquery problems from scalar filters to correlated row comparisons

CTE Practice Problems→

WITH clauses, chained CTEs, and recursive traversals at interview difficulty

SQL Interview Questions→

Complete guide to every SQL topic tested in data engineering interviews