Query Structure: Intermediate

Stripe's fraud detection pipeline reads from more than 40 signals per transaction: merchant category, device fingerprint, velocity, and user history. Every signal is computed as a named intermediate result inside a single SQL query using CTEs, then combined in the final SELECT. Without CTEs, that query would be one unreadable nested expression. The WITH clause is the most used construct in data engineering, and this is the lesson where you master it.

CTEs (WITH clause)

Daily Life
Interviews

Break complex queries into readable steps

A CTE is a named result set you define once at the top of a query, then reference by name anywhere below it. This works exactly the way you assign an intermediate variable in code before using it in a calculation. The filter logic lives inside the CTE, the main query reads from the named result, and the two steps stay cleanly separated. If you have ever computed something once to avoid repeating the same expression twice, you already understand what a CTE does.

CTE Usage Examples

The following examples show how to define and use CTEs, from a single named result set to chaining multiple CTEs together.
//

Basic Example

A CTE starts with WITH, followed by a name and query definition. The main query then references the CTE by name.

1WITH active_set AS (
2 SELECT
3 id,
4 name
5 FROM users
6 WHERE active = 1
7)
8
9SELECT
10 id,
11 name
12FROM active_set
13WHERE id > 100
Result
idname
101Dana
105Elena
112Frank

The CTE active_set filters for active users. The main query then filters for IDs greater than 100. This separates filtering steps, making the logic easier to follow.

//

Multiple CTEs

You can define several CTEs in a single query, separating them with commas. Each CTE can reference the ones declared before it, allowing you to build complex logic in clear, named stages.
1WITH users_set AS (
2 SELECT
3 id
4 FROM users
5 WHERE active = 1
6),
7orders_set AS (
8 SELECT
9 user_id
10 FROM orders
11 WHERE status = 'pending'
12)
13
14SELECT
15 id
16FROM users_set
17WHERE id IN (
18 SELECT
19 user_id
20 FROM orders_set
21)
Result
id
1
5
12
Multiple CTEs let you build complex queries step by step, each CTE producing a temporary dataset for the next.

Practice: CTE Basics

> Complete this query to define a reusable dataset of active users, then retrieve their IDs and names.

___ active_users ___ (
  SELECT
    id,
    name
  FROM users
  WHERE active = 1
)

SELECT
  id,
  name
FROM ___
FROM
AS
active_users
WITH
temp

Complex CTE Scenarios

CTEs shine in complex scenarios where multiple stages of filtering or computation need to be clearly named and independently tested.
//

Real-World Example

Imagine you have logs from many devices and you want to track devices with very high error counts. You first compute all device error counts, then select only the most critical ones. Using CTEs lets you clearly name each stage and reuse results if needed for additional queries or checks.
CTEs exist only during query execution and cannot be accessed outside it. They can be referenced multiple times without recalculation, and must be declared in order; each CTE can only use datasets defined before it.
Effective CTE Names
  • active_users describes the filtered set
  • monthly_revenue names the computation
  • error_devices clarifies the scope
Names to Avoid
  • temp says nothing about the data
  • cte1 forces readers to trace the query
  • data is too generic to be useful

Comparing Approaches

Both subqueries and CTEs achieve similar goals, but they differ in readability, reusability, and maintainability.
//

Subqueries vs. CTEs

Subqueries and CTEs produce the same results, but CTEs are easier to read and maintain, especially when working in multiple stages. Both allow intermediate datasets, but CTEs provide named, reusable datasets, making multi-step queries more understandable.
//

Subquery Version

With a subquery, the inner query is embedded directly in the WHERE clause. This works well for simple filters but becomes harder to read as complexity grows.

1SELECT
2 id
3FROM orders
4WHERE user_id IN (
5 SELECT
6 id
7 FROM users
8 WHERE active = 1
9)
Result
id
1001
1005
1008
1012
//

CTE Version

The CTE version moves the filter logic to a named block at the top, making the main query simpler to follow.
1WITH active_users AS (
2 SELECT
3 id
4 FROM users
5 WHERE active = 1
6)
7
8SELECT
9 id
10FROM orders
11WHERE user_id IN (
12 SELECT
13 id
14 FROM active_users
15)
Result
id
1001
1005
1008
1012
Both queries return the same results. The CTE version names the intermediate dataset, making it easier to read and maintain.
//

Real-World Example

When processing device logs, you often filter by valid services or active devices. A subquery can perform the same filtering inline, but a CTE names the intermediate dataset and allows multiple references. This makes pipelines easier to follow, modify, and check for correctness.
TIP
If you reference the same intermediate result more than once, always use a CTE. Duplicating a subquery means duplicating bugs when you change one copy but forget the other.
Which approach is easier to maintain when the same filtered set is needed twice?
The same subquery appears twice. If the filter changes, you must update both copies, which is error-prone.
1SELECT
2 COUNT(*)
3FROM (
4 SELECT
5 id
6 FROM users
7 WHERE active = 1
8) AS a
9
10UNION ALL
11
12SELECT
13 AVG(id)
14FROM (
15 SELECT
16 id
17 FROM users
18 WHERE active = 1
19) AS b

Query Execution Order

Daily Life
Interviews

Predict exactly how your query runs

SQL does not execute in the order it reads. CTEs and subqueries fire before the main query, WHERE filters rows before SELECT picks columns, and HAVING runs after aggregation that WHERE cannot touch. The clause that appears first on screen is rarely the first to run. For developers used to procedural code where line 1 runs before line 2, this is the single biggest source of cryptic SQL errors. The CTE you just wrote is the clearest example: the CTE block ran before anything in the main SELECT was evaluated.

Understanding Execution Order

Walk through this example to see how the CTE executes before the outer query, not after it.
//

Basic Example

This example shows how a CTE executes first to filter logs, then the outer query applies an additional filter on the result.
1WITH recent_logs AS (
2 SELECT
3 id,
4 log_date
5 FROM logs
6 WHERE log_date > '2024-01-01'
7)
8
9SELECT
10 id
11FROM recent_logs
12WHERE log_date > '2024-06-01'
The CTE executes first to produce the temporary dataset. The outer query then filters from it.

Practical Implications

Understanding execution order helps you avoid common pitfalls like referencing aliases before they exist.
//

Real-World Example

When working with device logs, intermediate results must exist before applying filters or aggregations. Ignoring execution order can lead to errors or incorrect results. Understanding the sequence of execution ensures each step behaves correctly and can be reused for other tasks.

Which query correctly uses a column alias defined in SELECT?

WHERE executes before SELECT, so the alias "total" does not exist yet. This query fails in most databases.
1SELECT
2 price * qty AS total
3FROM orders
4WHERE total > 100
Do
  • Use CTEs to make execution order explicit and visible
  • Test each step independently before combining them
  • Filter early with WHERE to reduce data processed
Don't
  • Don't assume clauses execute in the order they appear in your query
  • Avoid complex calculations in SELECT if not needed in output
Here is the full execution order to keep in mind when writing queries.

> Complete this query to find categories where the average price exceeds 50.

SELECT
  category,
  AVG(price) AS avg_price
FROM products
GROUP BY category
___ ___(price) > 50
AVG
SUM
HAVING
WHERE
EXECUTION ORDER SUMMARY
  • CTEs and subqueries execute before the main query
  • FROM locates and loads the table(s)
  • WHERE filters rows based on conditions
  • SELECT retrieves the specified columns
  • UNION/UNION ALL merges results from multiple queries

Remembering execution order prevents common errors like using aliases in WHERE or aggregate functions in WHERE instead of HAVING.

When debugging a query, mentally step through the execution order to verify that each clause has access to the data it references.

Subqueries for temp results

Daily Life
Interviews

Nest queries to answer layered questions

Before CTEs became standard, all intermediate results had to be written inline as nested queries: a query inside another query's WHERE clause or FROM clause. Subqueries still appear constantly in production SQL and in interview questions, and they produce exactly the same results as an equivalent CTE. Understanding subqueries means you can read any SQL in the wild, write the embedded style when a full CTE block would be overkill, and recognize when a nested query is slowing things down enough to warrant refactoring.

Subquery Types and Patterns

Subqueries appear in three positions: inside WHERE, inside FROM, and inside SELECT. Each position serves a different purpose.

//

Table Subquery Example

1SELECT
2 *
3FROM orders
4WHERE total > (
5 SELECT
6 AVG(total)
7 FROM orders
8)

A table subquery replaces a table name in the FROM clause with a complete SELECT statement. The inner query runs first to produce an intermediate result, then the outer query filters from that result.

//

Scalar Subquery

A scalar subquery returns a single value that can be used in the SELECT list or WHERE clause.

1SELECT
2 id,
3 (
4 SELECT
5 val
6 FROM config
7 WHERE config_key = 'default'
8 ) AS default_val
9FROM logs
Result
iddefault_val
101standard
102standard
103standard
The subquery retrieves a single value. Each row in the outer query shows the same default name.
//

Multi-row Subquery

When a subquery returns multiple rows, use IN to check if a value matches any result.

1SELECT
2 id
3FROM orders
4WHERE user_id IN (
5 SELECT
6 id
7 FROM users
8 WHERE active = 1
9)
Result
id
1001
1005
1008
1012
The subquery lists all active user IDs. The outer query selects orders linked to those users.

Practice: Subquery Filter

> Complete this query to find all orders placed by currently active users.

SELECT
  order_id
FROM orders
WHERE user_id ___ (
  ___
    id
  FROM users
  WHERE ___ = 1
)
active
EXISTS
status
IN
SELECT
//

Real-World Example

Suppose you need to monitor which devices have many error events. First, count the number of errors per device, then select devices exceeding a certain threshold.
Subqueries allow you to separate counting from selection, so each step is easy to check. You can reuse this approach for any dataset where part of the data must be computed before filtering.
Do
  • Use subqueries for one-off filtering or transformations
  • Test subquery performance separately before nesting
  • Keep subqueries simple and focused on a single task
Don't
  • Avoid deeply nested subqueries; use CTEs for readability
  • Don't use subqueries when a simple join would suffice

Subquery Execution

How the database processes a subquery depends on whether the subquery references the outer query or not, and that distinction has a direct impact on performance.
//

Correlated vs Uncorrelated

A non-correlated subquery runs once and shares its result with every row in the outer query. A correlated subquery references the outer query's current row, so it must re-execute for each row the outer query processes. On a table with one million rows, a correlated subquery runs one million times. This is why correlated subqueries can be slow and are often worth rewriting as a JOIN or CTE.

Inner First
Inner First
Subqueries execute before the outer query
Performance
Performance
Large or repeated subqueries can slow things down
Temporary
Temporary
Datasets exist only during query execution
Best Use
Best Use
Small, one-off computations and filtering

UNION and UNION ALL

Daily Life
Interviews

Combine results from multiple queries

UNION operations stack multiple query results vertically, creating a single combined dataset from separate queries.

UNION operations combine results from multiple queries into a single dataset. Use them to merge datasets from different tables or sources. Understanding UNION DISTINCT vs. UNION ALL is important because duplicates and performance differ.

UNION Comparison and Examples

These examples show the key difference between preserving all rows with UNION ALL and deduplicating with UNION DISTINCT.

//

Basic Example

UNION ALL stacks two result sets on top of each other, keeping every row from both queries including any duplicates.

1SELECT
2 id,
3 name
4FROM users
5
6UNION ALL
7
8SELECT
9 id,
10 name
11FROM admins
Result
idname
1Alice
2Bob
3Charlie
1Admin Alice
4Admin Dave

Combines rows from two tables. UNION ALL keeps duplicates. Columns must match in number and type.

//

UNION (aka UNION DISTINCT)

1SELECT
2 id,
3 name
4FROM customers
5
6UNION
7
8SELECT
9 id,
10 name
11FROM suppliers
Result
idname
1Alice
2Bob

Duplicates are removed. Writing UNION DISTINCT explicitly clarifies intent, but UNION without ALL or DISTINCT after it will perform a UNION DISTINCT operation, so clarity is best practice.

//

UNION ALL

1SELECT
2 id,
3 name
4FROM customers
5
6UNION ALL
7
8SELECT
9 id,
10 name
11FROM suppliers
Result
idname
1Alice
2Bob
1Alice
2Bob
1Alice
Duplicates are preserved. Faster because no deduplication is done.

Arrange the UNION Query

Parsons Problem

> Build a query that merges the iOS and Android user lists into one combined result.

Arrange each line of the UNION ALL query in order:

1
2
3
4
5

Practical Usage

Knowing when to use UNION ALL versus UNION DISTINCT depends on your data requirements and performance needs.

//

Real-World Example

Suppose you have separate logs from multiple environments and want to produce a single dataset for processing. UNION ALL preserves repeated device IDs, while UNION DISTINCT ensures only unique IDs are kept. Knowing which variant to use affects both the correctness and speed of your processing.

Do
  • Default to UNION ALL unless you specifically need deduplication
  • Use explicit column names for clarity
  • Test individual queries before combining them
Don't
  • Don't use SELECT * in UNION queries
  • Avoid unnecessary UNION DISTINCT; it's slower than UNION ALL
The performance difference between these two variants is dramatic at scale.

ORDER BY and LIMIT

Daily Life
Interviews

Sort and paginate result sets

Control the order and quantity of rows returned with these essential clauses that shape your final output.

ORDER BY sorts the rows in your result based on one or more columns. By default, sorting is in ascending order (smallest to largest, A to Z). Add DESC after the column name to sort in descending order (largest to smallest, Z to A).

1SELECT
2 name,
3 signup_date
4FROM users
5ORDER BY signup_date DESC
Result
namesignup_date
Charlie2025-01-15
Alice2024-11-20
Bob2024-08-05
This returns users sorted by signup date, with the most recent signups appearing first.

Restricting Row Counts

LIMIT controls how many rows to return, which is essential for pagination and "top N" queries.

LIMIT restricts the number of rows returned. It is often used with ORDER BY to get the "top N" results, such as the top 5 customers or the 10 most recent orders.

1SELECT
2 name,
3 total_spent
4FROM customers
5ORDER BY total_spent DESC
6LIMIT 3
Result
nametotal_spent
Global Corp125000
Acme Inc98500
Tech Solutions87200

This returns only the top 3 customers by total spending. Without LIMIT, you would see all customers.

Check Your Understanding

See if you can combine ORDER BY and LIMIT to answer a common business question.

> Complete this query to find the five most expensive products in the catalog.

SELECT
  product_name,
  price
FROM products
___ price ___
___ ___
ASC
DESC
5
ORDER BY
10
LIMIT

Combining ORDER BY with LIMIT is the standard pattern for "top N" queries, such as finding the five most expensive products or the ten most recent sign-ups.

Without ORDER BY, the database can return rows in any order it chooses, so LIMIT alone does not guarantee you get the highest or lowest values. It simply returns whichever rows the database finds first.

In large datasets, ORDER BY before LIMIT is much more efficient than sorting the entire result after the fact, because the database can often use an index to avoid scanning every row.

PUTTING IT ALL TOGETHER

> You are a data engineer at Spotify building a weekly active-artist report that combines streaming data from multiple regional databases and surfaces the top performers to the label partnerships team. The pipeline must be readable, maintainable, and return a clean ranked list.

CTEs with the WITH clause name each processing stage like regional_streams or deduplicated_plays so teammates can read and debug the pipeline step by step.
SQL execution order determines that the CTE runs before the main SELECT. Understanding this prevents alias errors and ensures HAVING is used instead of WHERE for post-aggregation filters.
Subqueries compute intermediate results such as per-artist stream counts before the outer query ranks and filters the final list.
UNION ALL merges artist streaming datasets from separate regional tables into one combined result set without expensive deduplication.
ORDER BY and LIMIT sort artists by total streams descending and cap the output to the top 20 for the partnerships report.
KEY TAKEAWAYS
CTEs (WITH clause) create named temporary datasets that improve readability and allow reuse, functioning as the SQL equivalent of naming an intermediate variable in code
SQL clauses execute in a logical order that differs from how they are written; CTEs and subqueries fire before the main query
Subqueries are queries inside another query that produce temporary results for filtering or computation
Subqueries execute once per use; CTEs can be referenced multiple times without recomputation
UNION combines result sets and removes duplicates; UNION ALL keeps all rows including duplicates
ORDER BY sorts results by column values; add DESC for descending order
LIMIT restricts the number of rows returned, often used with ORDER BY for "top N" queries

CTEs: subqueries are a cry for help

Category
SQL
Difficulty
intermediate
Duration
28 minutes
Challenges
0 hands-on challenges

Topics covered: CTEs (WITH clause), Query Execution Order, Subqueries for temp results, UNION and UNION ALL, ORDER BY and LIMIT

Lesson Sections

  1. CTEs (WITH clause) (concepts: sqlCte)

    A CTE is a named result set you define once at the top of a query, then reference by name anywhere below it. This works exactly the way you assign an intermediate variable in code before using it in a calculation. The filter logic lives inside the CTE, the main query reads from the named result, and the two steps stay cleanly separated. If you have ever computed something once to avoid repeating the same expression twice, you already understand what a CTE does. CTE Usage Examples The following e

  2. Query Execution Order (concepts: sqlExecutionOrder)

    Understanding Execution Order Walk through this example to see how the CTE executes before the outer query, not after it. Basic Example This example shows how a CTE executes first to filter logs, then the outer query applies an additional filter on the result. The CTE executes first to produce the temporary dataset. The outer query then filters from it. Practical Implications Understanding execution order helps you avoid common pitfalls like referencing aliases before they exist. Real-World Exam

  3. Subqueries for temp results (concepts: sqlSubqueryScalar)

    Subquery Types and Patterns Table Subquery Example Scalar Subquery The subquery retrieves a single value. Each row in the outer query shows the same default name. Multi-row Subquery The subquery lists all active user IDs. The outer query selects orders linked to those users. Practice: Subquery Filter Real-World Example Suppose you need to monitor which devices have many error events. First, count the number of errors per device, then select devices exceeding a certain threshold. Subqueries allow

  4. UNION and UNION ALL (concepts: sqlUnion)

    UNION Comparison and Examples Basic Example UNION (aka UNION DISTINCT) UNION ALL Duplicates are preserved. Faster because no deduplication is done. Arrange the UNION Query Practical Usage Real-World Example The performance difference between these two variants is dramatic at scale.

  5. ORDER BY and LIMIT (concepts: sqlOrderBy, sqlLimit)

    Control the order and quantity of rows returned with these essential clauses that shape your final output. This returns users sorted by signup date, with the most recent signups appearing first. Restricting Row Counts Check Your Understanding