SQL Practice
Subqueries and CTEs account for 4.9% of SQL interview questions. That number sounds small until you realize they cluster in the hardest rounds. Scalar, correlated, EXISTS, and IN subqueries with real execution, plus a clear breakdown of when to use subqueries vs CTEs vs JOINs.
Each type solves a different kind of problem. Know when to reach for each one.
SELECT name, salary, (SELECT AVG(salary) FROM employees) AS avg_sal FROM employeesReturns a single value. You can use it in SELECT, WHERE, or HAVING. The query engine runs it once and plugs the result in. If it returns more than one row, the query fails. Scalar subqueries are the simplest form and often the right tool for comparing each row to a global aggregate.
When to use: Use when you need a single number: the average, the max, a count, or a specific lookup value.
SELECT * FROM orders o WHERE amount > (SELECT AVG(amount) FROM orders WHERE customer_id = o.customer_id)References a column from the outer query. The database re-evaluates it for every row in the outer query. This makes them slower than uncorrelated subqueries on large tables, but they solve problems that uncorrelated subqueries cannot. The key tell: the inner query mentions an alias from the outer query.
When to use: Use when the subquery result depends on which row you are currently evaluating in the outer query.
SELECT * FROM customers c WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id)Returns TRUE if the subquery produces at least one row. The database can stop scanning as soon as it finds a match. This makes EXISTS faster than IN for large datasets. It is the standard way to check for the presence (or absence with NOT EXISTS) of related rows.
When to use: Use for existence checks. "Find customers who have at least one order." "Find products that have never been returned."
SELECT * FROM products WHERE category_id IN (SELECT id FROM categories WHERE active = true)Checks whether a value matches any row in the subquery result. Functionally similar to EXISTS for simple cases, but behaves differently with NULLs. If the subquery returns a NULL, NOT IN produces unexpected results because NULL comparisons are always unknown. This is a classic interview gotcha.
When to use: Use for simple membership tests where the subquery column is NOT NULL. Prefer EXISTS when NULLs are possible.
Each problem targets a different subquery type. The last one is deliberately painful to show why window functions were invented.
Find all employees whose salary is above the company average. Use a scalar subquery in the WHERE clause. This is the entry-level subquery question that establishes whether you know the basic syntax.
Find customers who placed at least one order in the last 30 days. Solve it two ways: once with EXISTS, once with IN. Compare the results and explain when they differ. Hint: they differ when NULLs are involved.
Find the highest-paid employee in each department. Use a correlated subquery: for each employee, check if their salary equals the maximum salary in their department. This is the canonical correlated subquery problem.
Find products that have never appeared in an order. Solve it three ways: LEFT JOIN with IS NULL, NOT EXISTS, and NOT IN. All three return the same result if there are no NULLs. Add a NULL product_id to the orders table and see which approach breaks.
For each order, show the order amount and the cumulative average of all previous orders by the same customer. Use a correlated subquery (no window functions allowed). This is hard. The correlated subquery must filter by customer_id AND by order_date less than the current row. It demonstrates why window functions exist.
Interviewers sometimes ask you to rewrite a subquery as a JOIN, or explain why you chose one over the other. SQL is the single most-tested skill in data engineering interviews. Knowing which tool to reach for, and why, is the difference between passing and failing.
Simple, self-contained, works everywhere. Scalar subqueries in SELECT are easy to read. EXISTS is efficient for existence checks.
Correlated subqueries re-execute per row. Nested subqueries beyond two levels become unreadable. NOT IN breaks with NULLs.
Simple lookups, existence checks, one-off comparisons to aggregates
Named, readable, top-to-bottom flow. Each step is isolated and testable. Can reference the same CTE multiple times.
Not all databases optimize CTEs well (some materialize them). Cannot be correlated to an outer query.
Multi-step transformations, queries that reference intermediate results more than once
Optimizers understand JOINs best. Set-based, efficient for large-scale matching. Clear cardinality semantics.
Fan-out risk with one-to-many joins. Can be hard to read with 4+ tables. Requires understanding NULL behavior in outer joins.
Combining tables, matching rows, any problem where you need columns from multiple sources
Given an employees table with name, department_id, and salary, find employees whose salary exceeds the average salary in their own department. Return the employee name, their salary, and the department average.
SELECT
e.name,
e.department_id,
e.salary,
dept_avg.avg_salary
FROM employees e
JOIN (
SELECT department_id, ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department_id
) dept_avg ON dept_avg.department_id = e.department_id
WHERE e.salary > dept_avg.avg_salary
ORDER BY e.salary - dept_avg.avg_salary DESC;The subquery in the FROM clause computes each department's average once, then the outer query joins it back to filter individual employees. This avoids re-computing the average per row, which is what a correlated subquery in the WHERE clause would do. Ordering by the gap between salary and average surfaces the biggest outliers first.
Expected output
name | department_id | salary | avg_salary
---------+---------------+-----------+-----------
Diana | 2 | 145000.00 | 98333.33
Frank | 1 | 120000.00 | 95000.00The difference between knowing subquery syntax and writing a correlated subquery in an interview is practice. Write queries, run them, and see the output.