Query Structure: Intermediate

Stripe's fraud detection pipeline reads from more than 40 signals per transaction: merchant category, device fingerprint, velocity, and user history. Every signal is computed as a named intermediate result inside a single SQL query using CTEs, then combined in the final SELECT. Without CTEs, that query would be one unreadable nested expression. The WITH clause is the most used construct in data engineering, and this is the lesson where you master it.

CTEs (WITH clause)

Daily Life

Interviews

Break complex queries into readable steps

A CTE is a named result set you define once at the top of a query, then reference by name anywhere below it. This works exactly the way you assign an intermediate variable in code before using it in a calculation. The filter logic lives inside the CTE, the main query reads from the named result, and the two steps stay cleanly separated. If you have ever computed something once to avoid repeating the same expression twice, you already understand what a CTE does.

CTE Usage Examples

The following examples show how to define and use CTEs, from a single named result set to chaining multiple CTEs together.

Basic Example

A CTE starts with WITH, followed by a name and query definition. The main query then references the CTE by name.

1	WITH active_set AS (
2	SELECT
3	id,
4	name
5	FROM users
6	WHERE active = 1
7	)
8
9	SELECT
10	id,
11	name
12	FROM active_set
13	WHERE id > 100

Result

id	name
101	Dana
105	Elena
112	Frank

The CTE active_set filters for active users. The main query then filters for IDs greater than 100. This separates filtering steps, making the logic easier to follow.

Multiple CTEs

You can define several CTEs in a single query, separating them with commas. Each CTE can reference the ones declared before it, allowing you to build complex logic in clear, named stages.

1	WITH users_set AS (
2	SELECT
3	id
4	FROM users
5	WHERE active = 1
6	),
7	orders_set AS (
8	SELECT
9	user_id
10	FROM orders
11	WHERE status = 'pending'
12	)
13
14	SELECT
15	id
16	FROM users_set
17	WHERE id IN (
18	SELECT
19	user_id
20	FROM orders_set
21	)

Result

id
1
5
12

Multiple CTEs let you build complex queries step by step, each CTE producing a temporary dataset for the next.

Practice: CTE Basics

> Complete this query to define a reusable dataset of active users, then retrieve their IDs and names.

___ active_users ___ (
  SELECT
    id,
    name
  FROM users
  WHERE active = 1
)

SELECT
  id,
  name
FROM ___

temp

FROM

WITH

active_users

Complex CTE Scenarios

CTEs shine in complex scenarios where multiple stages of filtering or computation need to be clearly named and independently tested.

Real-World Example

Imagine you have logs from many devices and you want to track devices with very high error counts. You first compute all device error counts, then select only the most critical ones. Using CTEs lets you clearly name each stage and reuse results if needed for additional queries or checks.

CTEs exist only during query execution and cannot be accessed outside it. They can be referenced multiple times without recalculation, and must be declared in order; each CTE can only use datasets defined before it.

•Effective CTE Names

active_users describes the filtered set
monthly_revenue names the computation
error_devices clarifies the scope

•Names to Avoid

temp says nothing about the data
cte1 forces readers to trace the query
data is too generic to be useful

Comparing Approaches

Both subqueries and CTEs achieve similar goals, but they differ in readability, reusability, and maintainability.

Subqueries vs. CTEs

Subqueries and CTEs produce the same results, but CTEs are easier to read and maintain, especially when working in multiple stages. Both allow intermediate datasets, but CTEs provide named, reusable datasets, making multi-step queries more understandable.

Subquery Version

With a subquery, the inner query is embedded directly in the WHERE clause. This works well for simple filters but becomes harder to read as complexity grows.

1	SELECT
2	id
3	FROM orders
4	WHERE user_id IN (
5	SELECT
6	id
7	FROM users
8	WHERE active = 1
9	)

Result

id
1001
1005
1008
1012

CTE Version

The CTE version moves the filter logic to a named block at the top, making the main query simpler to follow.

1	WITH active_users AS (
2	SELECT
3	id
4	FROM users
5	WHERE active = 1
6	)
7
8	SELECT
9	id
10	FROM orders
11	WHERE user_id IN (
12	SELECT
13	id
14	FROM active_users
15	)

Result

id
1001
1005
1008
1012

Both queries return the same results. The CTE version names the intermediate dataset, making it easier to read and maintain.

Real-World Example

When processing device logs, you often filter by valid services or active devices. A subquery can perform the same filtering inline, but a CTE names the intermediate dataset and allows multiple references. This makes pipelines easier to follow, modify, and check for correctness.

TIP

If you reference the same intermediate result more than once, always use a CTE. Duplicating a subquery means duplicating bugs when you change one copy but forget the other.

Which approach is easier to maintain when the same filtered set is needed twice?

The same subquery appears twice. If the filter changes, you must update both copies, which is error-prone.

1	SELECT
2	COUNT(*)
3	FROM (
4	SELECT
5	id
6	FROM users
7	WHERE active = 1
8	) AS a
9
10	UNION ALL
11
12	SELECT
13	AVG(id)
14	FROM (
15	SELECT
16	id
17	FROM users
18	WHERE active = 1
19	) AS b

Query Execution Order

Daily Life

Interviews

Predict exactly how your query runs

SQL does not execute in the order it reads. CTEs and subqueries fire before the main query, WHERE filters rows before SELECT picks columns, and HAVING runs after aggregation that WHERE cannot touch. The clause that appears first on screen is rarely the first to run. For developers used to procedural code where line 1 runs before line 2, this is the single biggest source of cryptic SQL errors. The CTE you just wrote is the clearest example: the CTE block ran before anything in the main SELECT was evaluated.

Understanding Execution Order

Walk through this example to see how the CTE executes before the outer query, not after it.

Basic Example

This example shows how a CTE executes first to filter logs, then the outer query applies an additional filter on the result.

1	WITH recent_logs AS (
2	SELECT
3	id,
4	log_date
5	FROM logs
6	WHERE log_date > '2024-01-01'
7	)
8
9	SELECT
10	id
11	FROM recent_logs
12	WHERE log_date > '2024-06-01'

The CTE executes first to produce the temporary dataset. The outer query then filters from it.

Practical Implications

Understanding execution order helps you avoid common pitfalls like referencing aliases before they exist.

Real-World Example

When working with device logs, intermediate results must exist before applying filters or aggregations. Ignoring execution order can lead to errors or incorrect results. Understanding the sequence of execution ensures each step behaves correctly and can be reused for other tasks.

Which query correctly uses a column alias defined in SELECT?

WHERE executes before SELECT, so the alias "total" does not exist yet. This query fails in most databases.

1	SELECT
2	price * qty AS total
3	FROM orders
4	WHERE total > 100

✓Do

Use CTEs to make execution order explicit and visible
Test each step independently before combining them
Filter early with WHERE to reduce data processed

✗Don't

Don't assume clauses execute in the order they appear in your query
Avoid complex calculations in SELECT if not needed in output

Here is the full execution order to keep in mind when writing queries.

> Complete this query to find categories where the average price exceeds 50.

SELECT
  category,
  AVG(price) AS avg_price
FROM products
GROUP BY category
___ ___(price) > 50

SUM

WHERE

HAVING

AVG

EXECUTION ORDER SUMMARY

CTEs and subqueries execute before the main query
FROM locates and loads the table(s)
WHERE filters rows based on conditions
SELECT retrieves the specified columns
UNION/UNION ALL merges results from multiple queries

Remembering execution order prevents common errors like using aliases in WHERE or aggregate functions in WHERE instead of HAVING.

When debugging a query, mentally step through the execution order to verify that each clause has access to the data it references.

Subqueries for temp results

Daily Life

Interviews

Nest queries to answer layered questions

Before CTEs became standard, all intermediate results had to be written inline as nested queries: a query inside another query's WHERE clause or FROM clause. Subqueries still appear constantly in production SQL and in interview questions, and they produce exactly the same results as an equivalent CTE. Understanding subqueries means you can read any SQL in the wild, write the embedded style when a full CTE block would be overkill, and recognize when a nested query is slowing things down enough to warrant refactoring.

Subquery Types and Patterns

Subqueries appear in three positions: inside WHERE, inside FROM, and inside SELECT. Each position serves a different purpose.

Table Subquery Example

1	SELECT
2	*
3	FROM orders
4	WHERE total > (
5	SELECT
6	AVG(total)
7	FROM orders
8	)

A table subquery replaces a table name in the FROM clause with a complete SELECT statement. The inner query runs first to produce an intermediate result, then the outer query filters from that result.

Scalar Subquery

A scalar subquery returns a single value that can be used in the SELECT list or WHERE clause.

1	SELECT
2	id,
3	(
4	SELECT
5	val
6	FROM config
7	WHERE config_key = 'default'
8	) AS default_val
9	FROM logs

Result

id	default_val
101	standard
102	standard
103	standard

The subquery retrieves a single value. Each row in the outer query shows the same default name.

Multi-row Subquery

When a subquery returns multiple rows, use IN to check if a value matches any result.

1	SELECT
2	id
3	FROM orders
4	WHERE user_id IN (
5	SELECT
6	id
7	FROM users
8	WHERE active = 1
9	)

Result

id
1001
1005
1008
1012

The subquery lists all active user IDs. The outer query selects orders linked to those users.

Practice: Subquery Filter

> Complete this query to find all orders placed by currently active users.

SELECT
  order_id
FROM orders
WHERE user_id ___ (
  ___
    id
  FROM users
  WHERE ___ = 1
)

status

active

SELECT

EXISTS

Real-World Example

Suppose you need to monitor which devices have many error events. First, count the number of errors per device, then select devices exceeding a certain threshold.

Subqueries allow you to separate counting from selection, so each step is easy to check. You can reuse this approach for any dataset where part of the data must be computed before filtering.

✓Do

Use subqueries for one-off filtering or transformations
Test subquery performance separately before nesting
Keep subqueries simple and focused on a single task

✗Don't

Avoid deeply nested subqueries; use CTEs for readability
Don't use subqueries when a simple join would suffice

Subquery Execution

How the database processes a subquery depends on whether the subquery references the outer query or not, and that distinction has a direct impact on performance.

Correlated vs Uncorrelated

A non-correlated subquery runs once and shares its result with every row in the outer query. A correlated subquery references the outer query's current row, so it must re-execute for each row the outer query processes. On a table with one million rows, a correlated subquery runs one million times. This is why correlated subqueries can be slow and are often worth rewriting as a JOIN or CTE.

Inner First

Subqueries execute before the outer query

Performance

Large or repeated subqueries can slow things down

Temporary

Datasets exist only during query execution

Best Use

Small, one-off computations and filtering

UNION and UNION ALL

Daily Life

Interviews

Combine results from multiple queries

UNION operations stack multiple query results vertically, creating a single combined dataset from separate queries.

UNION operations combine results from multiple queries into a single dataset. Use them to merge datasets from different tables or sources. Understanding UNION DISTINCT vs. UNION ALL is important because duplicates and performance differ.

UNION Comparison and Examples

These examples show the key difference between preserving all rows with UNION ALL and deduplicating with UNION DISTINCT.

Basic Example

UNION ALL stacks two result sets on top of each other, keeping every row from both queries including any duplicates.

1	SELECT
2	id,
3	name
4	FROM users
5
6	UNION ALL
7
8	SELECT
9	id,
10	name
11	FROM admins

Result

id	name
1	Alice
2	Bob
3	Charlie
1	Admin Alice
4	Admin Dave

Combines rows from two tables. UNION ALL keeps duplicates. Columns must match in number and type.

UNION (aka UNION DISTINCT)

1	SELECT
2	id,
3	name
4	FROM customers
5
6	UNION
7
8	SELECT
9	id,
10	name
11	FROM suppliers

Result

id	name
1	Alice
2	Bob

Duplicates are removed. Writing UNION DISTINCT explicitly clarifies intent, but UNION without ALL or DISTINCT after it will perform a UNION DISTINCT operation, so clarity is best practice.

UNION ALL

1	SELECT
2	id,
3	name
4	FROM customers
5
6	UNION ALL
7
8	SELECT
9	id,
10	name
11	FROM suppliers

Result

id	name
1	Alice
2	Bob
1	Alice
2	Bob
1	Alice

Duplicates are preserved. Faster because no deduplication is done.

Arrange the UNION Query

Parsons Problem

> Build a query that merges the iOS and Android user lists into one combined result.

Arrange each line of the UNION ALL query in order:

Practical Usage

Knowing when to use UNION ALL versus UNION DISTINCT depends on your data requirements and performance needs.

Real-World Example

Suppose you have separate logs from multiple environments and want to produce a single dataset for processing. UNION ALL preserves repeated device IDs, while UNION DISTINCT ensures only unique IDs are kept. Knowing which variant to use affects both the correctness and speed of your processing.

✓Do

Default to UNION ALL unless you specifically need deduplication
Use explicit column names for clarity
Test individual queries before combining them

✗Don't

Don't use SELECT * in UNION queries
Avoid unnecessary UNION DISTINCT; it's slower than UNION ALL

The performance difference between these two variants is dramatic at scale.

ORDER BY and LIMIT

Daily Life

Interviews

Sort and paginate result sets

Control the order and quantity of rows returned with these essential clauses that shape your final output.

ORDER BY sorts the rows in your result based on one or more columns. By default, sorting is in ascending order (smallest to largest, A to Z). Add DESC after the column name to sort in descending order (largest to smallest, Z to A).

1	SELECT
2	name,
3	signup_date
4	FROM users
5	ORDER BY signup_date DESC

Result

name	signup_date
Charlie	2025-01-15
Alice	2024-11-20
Bob	2024-08-05

This returns users sorted by signup date, with the most recent signups appearing first.

Restricting Row Counts

LIMIT controls how many rows to return, which is essential for pagination and "top N" queries.

LIMIT restricts the number of rows returned. It is often used with ORDER BY to get the "top N" results, such as the top 5 customers or the 10 most recent orders.

1	SELECT
2	name,
3	total_spent
4	FROM customers
5	ORDER BY total_spent DESC
6	LIMIT 3

Result

name	total_spent
Global Corp	125000
Acme Inc	98500
Tech Solutions	87200

This returns only the top 3 customers by total spending. Without LIMIT, you would see all customers.

Check Your Understanding

See if you can combine ORDER BY and LIMIT to answer a common business question.

> Complete this query to find the five most expensive products in the catalog.

SELECT
  product_name,
  price
FROM products
___ price ___
___ ___

LIMIT

ASC

ORDER BY

DESC

Combining ORDER BY with LIMIT is the standard pattern for "top N" queries, such as finding the five most expensive products or the ten most recent sign-ups.

Without ORDER BY, the database can return rows in any order it chooses, so LIMIT alone does not guarantee you get the highest or lowest values. It simply returns whichever rows the database finds first.

In large datasets, ORDER BY before LIMIT is much more efficient than sorting the entire result after the fact, because the database can often use an index to avoid scanning every row.

❯❯❯PUTTING IT ALL TOGETHER

> You are a data engineer at Spotify building a weekly active-artist report that combines streaming data from multiple regional databases and surfaces the top performers to the label partnerships team. The pipeline must be readable, maintainable, and return a clean ranked list.

CTEs with the WITH clause name each processing stage like regional_streams or deduplicated_plays so teammates can read and debug the pipeline step by step.

SQL execution order determines that the CTE runs before the main SELECT. Understanding this prevents alias errors and ensures HAVING is used instead of WHERE for post-aggregation filters.

Subqueries compute intermediate results such as per-artist stream counts before the outer query ranks and filters the final list.

UNION ALL merges artist streaming datasets from separate regional tables into one combined result set without expensive deduplication.

ORDER BY and LIMIT sort artists by total streams descending and cap the output to the top 20 for the partnerships report.

KEY TAKEAWAYS

CTEs (WITH clause) create named temporary datasets that improve readability and allow reuse, functioning as the SQL equivalent of naming an intermediate variable in code

SQL clauses execute in a logical order that differs from how they are written; CTEs and subqueries fire before the main query

Subqueries are queries inside another query that produce temporary results for filtering or computation

Subqueries execute once per use; CTEs can be referenced multiple times without recomputation

UNION combines result sets and removes duplicates; UNION ALL keeps all rows including duplicates

ORDER BY sorts results by column values; add DESC for descending order

LIMIT restricts the number of rows returned, often used with ORDER BY for "top N" queries

CTEs: subqueries are a cry for help

Category: SQL
Difficulty: intermediate
Duration: 28 minutes
Challenges: 0 hands-on challenges

Topics covered: CTEs (WITH clause), Query Execution Order, Subqueries for temp results, UNION and UNION ALL, ORDER BY and LIMIT

Lesson Sections

CTEs (WITH clause) (concepts: sqlCte)
A CTE is a named result set you define once at the top of a query, then reference by name anywhere below it. This works exactly the way you assign an intermediate variable in code before using it in a calculation. The filter logic lives inside the CTE, the main query reads from the named result, and the two steps stay cleanly separated. If you have ever computed something once to avoid repeating the same expression twice, you already understand what a CTE does. CTE Usage Examples The following e
Query Execution Order (concepts: sqlExecutionOrder)
Understanding Execution Order Walk through this example to see how the CTE executes before the outer query, not after it. Basic Example This example shows how a CTE executes first to filter logs, then the outer query applies an additional filter on the result. The CTE executes first to produce the temporary dataset. The outer query then filters from it. Practical Implications Understanding execution order helps you avoid common pitfalls like referencing aliases before they exist. Real-World Exam
Subqueries for temp results (concepts: sqlSubqueryScalar)
Subquery Types and Patterns Table Subquery Example Scalar Subquery The subquery retrieves a single value. Each row in the outer query shows the same default name. Multi-row Subquery The subquery lists all active user IDs. The outer query selects orders linked to those users. Practice: Subquery Filter Real-World Example Suppose you need to monitor which devices have many error events. First, count the number of errors per device, then select devices exceeding a certain threshold. Subqueries allow
UNION and UNION ALL (concepts: sqlUnion)
UNION Comparison and Examples Basic Example UNION (aka UNION DISTINCT) UNION ALL Duplicates are preserved. Faster because no deduplication is done. Arrange the UNION Query Practical Usage Real-World Example The performance difference between these two variants is dramatic at scale.
ORDER BY and LIMIT (concepts: sqlOrderBy, sqlLimit)
Control the order and quantity of rows returned with these essential clauses that shape your final output. This returns users sorted by signup date, with the most recent signups appearing first. Restricting Row Counts Check Your Understanding