SQL Practice

SQL Subquery Practice with Examples

Subqueries and CTEs account for 4.9% of SQL interview questions. That number sounds small until you realize they cluster in the hardest rounds. Scalar, correlated, EXISTS, and IN subqueries with real execution, plus a clear breakdown of when to use subqueries vs CTEs vs JOINs.

Four Types of SQL Subqueries

Each type solves a different kind of problem. Know when to reach for each one.

Scalar Subquery

+
SELECT name, salary, (SELECT AVG(salary) FROM employees) AS avg_sal FROM employees

Returns a single value. You can use it in SELECT, WHERE, or HAVING. The query engine runs it once and plugs the result in. If it returns more than one row, the query fails. Scalar subqueries are the simplest form and often the right tool for comparing each row to a global aggregate.

When to use: Use when you need a single number: the average, the max, a count, or a specific lookup value.

Correlated Subquery

+
SELECT * FROM orders o WHERE amount > (SELECT AVG(amount) FROM orders WHERE customer_id = o.customer_id)

References a column from the outer query. The database re-evaluates it for every row in the outer query. This makes them slower than uncorrelated subqueries on large tables, but they solve problems that uncorrelated subqueries cannot. The key tell: the inner query mentions an alias from the outer query.

When to use: Use when the subquery result depends on which row you are currently evaluating in the outer query.

EXISTS Subquery

+
SELECT * FROM customers c WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id)

Returns TRUE if the subquery produces at least one row. The database can stop scanning as soon as it finds a match. This makes EXISTS faster than IN for large datasets. It is the standard way to check for the presence (or absence with NOT EXISTS) of related rows.

When to use: Use for existence checks. "Find customers who have at least one order." "Find products that have never been returned."

IN Subquery

+
SELECT * FROM products WHERE category_id IN (SELECT id FROM categories WHERE active = true)

Checks whether a value matches any row in the subquery result. Functionally similar to EXISTS for simple cases, but behaves differently with NULLs. If the subquery returns a NULL, NOT IN produces unexpected results because NULL comparisons are always unknown. This is a classic interview gotcha.

When to use: Use for simple membership tests where the subquery column is NOT NULL. Prefer EXISTS when NULLs are possible.

5 Subquery Practice Problems

Each problem targets a different subquery type. The last one is deliberately painful to show why window functions were invented.

#1

Above-Average Salary

Scalar+

Find all employees whose salary is above the company average. Use a scalar subquery in the WHERE clause. This is the entry-level subquery question that establishes whether you know the basic syntax.

#2

Customers With Recent Orders

EXISTS / IN+

Find customers who placed at least one order in the last 30 days. Solve it two ways: once with EXISTS, once with IN. Compare the results and explain when they differ. Hint: they differ when NULLs are involved.

#3

Per-Group Maximum

Correlated+

Find the highest-paid employee in each department. Use a correlated subquery: for each employee, check if their salary equals the maximum salary in their department. This is the canonical correlated subquery problem.

#4

Products Never Ordered

NOT EXISTS / NOT IN+

Find products that have never appeared in an order. Solve it three ways: LEFT JOIN with IS NULL, NOT EXISTS, and NOT IN. All three return the same result if there are no NULLs. Add a NULL product_id to the orders table and see which approach breaks.

#5

Running Comparison Without Window Functions

Correlated+

For each order, show the order amount and the cumulative average of all previous orders by the same customer. Use a correlated subquery (no window functions allowed). This is hard. The correlated subquery must filter by customer_id AND by order_date less than the current row. It demonstrates why window functions exist.

Subquery vs CTE vs JOIN: When to Use What

Interviewers sometimes ask you to rewrite a subquery as a JOIN, or explain why you chose one over the other. SQL is the single most-tested skill in data engineering interviews. Knowing which tool to reach for, and why, is the difference between passing and failing.

Subquery

+

Strengths

Simple, self-contained, works everywhere. Scalar subqueries in SELECT are easy to read. EXISTS is efficient for existence checks.

Weaknesses

Correlated subqueries re-execute per row. Nested subqueries beyond two levels become unreadable. NOT IN breaks with NULLs.

Best for

Simple lookups, existence checks, one-off comparisons to aggregates

CTE

+

Strengths

Named, readable, top-to-bottom flow. Each step is isolated and testable. Can reference the same CTE multiple times.

Weaknesses

Not all databases optimize CTEs well (some materialize them). Cannot be correlated to an outer query.

Best for

Multi-step transformations, queries that reference intermediate results more than once

JOIN

+

Strengths

Optimizers understand JOINs best. Set-based, efficient for large-scale matching. Clear cardinality semantics.

Weaknesses

Fan-out risk with one-to-many joins. Can be hard to read with 4+ tables. Requires understanding NULL behavior in outer joins.

Best for

Combining tables, matching rows, any problem where you need columns from multiple sources

Worked Example: Employees Who Earn Above Their Department Average

Given an employees table with name, department_id, and salary, find employees whose salary exceeds the average salary in their own department. Return the employee name, their salary, and the department average.

SELECT
  e.name,
  e.department_id,
  e.salary,
  dept_avg.avg_salary
FROM employees e
JOIN (
  SELECT department_id, ROUND(AVG(salary), 2) AS avg_salary
  FROM employees
  GROUP BY department_id
) dept_avg ON dept_avg.department_id = e.department_id
WHERE e.salary > dept_avg.avg_salary
ORDER BY e.salary - dept_avg.avg_salary DESC;

The subquery in the FROM clause computes each department's average once, then the outer query joins it back to filter individual employees. This avoids re-computing the average per row, which is what a correlated subquery in the WHERE clause would do. Ordering by the gap between salary and average surfaces the biggest outliers first.

Expected output

 name    | department_id | salary    | avg_salary
---------+---------------+-----------+-----------
 Diana   |             2 | 145000.00 |   98333.33
 Frank   |             1 | 120000.00 |   95000.00

Subquery FAQ

What is a correlated subquery in SQL?+
A correlated subquery references a column from the outer query. The database evaluates it once for every row in the outer result set. Example: finding employees who earn more than their department average. The subquery filters by the current row's department_id, so it produces a different result for each row.
When should I use EXISTS instead of IN?+
Use EXISTS when checking for the presence of related rows, especially on large tables. EXISTS stops scanning as soon as it finds one match. Also use EXISTS when the subquery column might contain NULLs. NOT IN returns no rows if any NULL is present in the subquery result. NOT EXISTS handles NULLs correctly.
Are subqueries slower than JOINs?+
Not necessarily. Modern query optimizers often rewrite subqueries as joins internally. Scalar and EXISTS subqueries are well-optimized in most engines. Correlated subqueries can be slow on large tables because they re-execute per row. If performance matters, check the execution plan rather than assuming one form is always faster.
Can I use a subquery in the SELECT clause?+
Yes. A scalar subquery in SELECT returns one value per row. Example: SELECT name, (SELECT COUNT(*) FROM orders WHERE orders.user_id = users.id) AS order_count FROM users. This is a correlated scalar subquery. It works but can be slow on large tables. A LEFT JOIN with GROUP BY often performs better.

Write Subqueries That Actually Run

The difference between knowing subquery syntax and writing a correlated subquery in an interview is practice. Write queries, run them, and see the output.