Query Structure: Intermediate
Stripe's fraud detection pipeline reads from more than 40 signals per transaction: merchant category, device fingerprint, velocity, and user history. Every signal is computed as a named intermediate result inside a single SQL query using CTEs, then combined in the final SELECT. Without CTEs, that query would be one unreadable nested expression. The WITH clause is the most used construct in data engineering, and this is the lesson where you master it.
CTEs (WITH clause)
Break complex queries into readable steps
CTE Usage Examples
Basic Example
A CTE starts with WITH, followed by a name and query definition. The main query then references the CTE by name.
The CTE active_set filters for active users. The main query then filters for IDs greater than 100. This separates filtering steps, making the logic easier to follow.
Multiple CTEs
Practice: CTE Basics
> Complete this query to define a reusable dataset of active users, then retrieve their IDs and names.
active_users ( SELECT id, name FROM users WHERE active = 1 ) SELECT id, name FROM
Complex CTE Scenarios
Real-World Example
- active_users describes the filtered set
- monthly_revenue names the computation
- error_devices clarifies the scope
- temp says nothing about the data
- cte1 forces readers to trace the query
- data is too generic to be useful
Comparing Approaches
Subqueries vs. CTEs
Subquery Version
With a subquery, the inner query is embedded directly in the WHERE clause. This works well for simple filters but becomes harder to read as complexity grows.
CTE Version
Real-World Example
CTE. Duplicating a subquery means duplicating bugs when you change one copy but forget the other.Query Execution Order
Predict exactly how your query runs
SQL does not execute in the order it reads. CTEs and subqueries fire before the main query, WHERE filters rows before SELECT picks columns, and HAVING runs after aggregation that WHERE cannot touch. The clause that appears first on screen is rarely the first to run. For developers used to procedural code where line 1 runs before line 2, this is the single biggest source of cryptic SQL errors. The CTE you just wrote is the clearest example: the CTE block ran before anything in the main SELECT was evaluated.
Understanding Execution Order
Basic Example
Practical Implications
Real-World Example
Which query correctly uses a column alias defined in SELECT?
- Use CTEs to make execution order explicit and visible
- Test each step independently before combining them
- Filter early with WHERE to reduce data processed
- Don't assume clauses execute in the order they appear in your query
- Avoid complex calculations in SELECT if not needed in output
> Complete this query to find categories where the average price exceeds 50.
SELECT category, AVG(price) AS avg_price FROM products GROUP BY category (price) > 50
- CTEs and subqueries execute before the main query
FROMlocates and loads the table(s)WHEREfilters rows based on conditionsSELECTretrieves the specified columnsUNION/UNION ALLmerges results from multiple queries
Remembering execution order prevents common errors like using aliases in WHERE or aggregate functions in WHERE instead of HAVING.
Subqueries for temp results
Nest queries to answer layered questions
Before CTEs became standard, all intermediate results had to be written inline as nested queries: a query inside another query's WHERE clause or FROM clause. Subqueries still appear constantly in production SQL and in interview questions, and they produce exactly the same results as an equivalent CTE. Understanding subqueries means you can read any SQL in the wild, write the embedded style when a full CTE block would be overkill, and recognize when a nested query is slowing things down enough to warrant refactoring.
Subquery Types and Patterns
Subqueries appear in three positions: inside WHERE, inside FROM, and inside SELECT. Each position serves a different purpose.
Table Subquery Example
A table subquery replaces a table name in the FROM clause with a complete SELECT statement. The inner query runs first to produce an intermediate result, then the outer query filters from that result.
Scalar Subquery
A scalar subquery returns a single value that can be used in the SELECT list or WHERE clause.
Multi-row Subquery
When a subquery returns multiple rows, use IN to check if a value matches any result.
Practice: Subquery Filter
> Complete this query to find all orders placed by currently active users.
SELECT order_id FROM orders WHERE user_id ( id FROM users WHERE = 1 )
Real-World Example
- Use subqueries for one-off filtering or transformations
- Test subquery performance separately before nesting
- Keep subqueries simple and focused on a single task
- Avoid deeply nested subqueries; use CTEs for readability
- Don't use subqueries when a simple join would suffice
Subquery Execution
Correlated vs Uncorrelated
A non-correlated subquery runs once and shares its result with every row in the outer query. A correlated subquery references the outer query's current row, so it must re-execute for each row the outer query processes. On a table with one million rows, a correlated subquery runs one million times. This is why correlated subqueries can be slow and are often worth rewriting as a JOIN or CTE.
UNION and UNION ALL
Combine results from multiple queries
UNION operations stack multiple query results vertically, creating a single combined dataset from separate queries.
UNION operations combine results from multiple queries into a single dataset. Use them to merge datasets from different tables or sources. Understanding UNION DISTINCT vs. UNION ALL is important because duplicates and performance differ.
UNION Comparison and Examples
These examples show the key difference between preserving all rows with UNION ALL and deduplicating with UNION DISTINCT.
Basic Example
UNION ALL stacks two result sets on top of each other, keeping every row from both queries including any duplicates.
Combines rows from two tables. UNION ALL keeps duplicates. Columns must match in number and type.
UNION (aka UNION DISTINCT)
Duplicates are removed. Writing UNION DISTINCT explicitly clarifies intent, but UNION without ALL or DISTINCT after it will perform a UNION DISTINCT operation, so clarity is best practice.
UNION ALL
Arrange the UNION Query
> Build a query that merges the iOS and Android user lists into one combined result.
Arrange each line of the UNION ALL query in order:
Practical Usage
Knowing when to use UNION ALL versus UNION DISTINCT depends on your data requirements and performance needs.
Real-World Example
Suppose you have separate logs from multiple environments and want to produce a single dataset for processing. UNION ALL preserves repeated device IDs, while UNION DISTINCT ensures only unique IDs are kept. Knowing which variant to use affects both the correctness and speed of your processing.
- Default to UNION ALL unless you specifically need deduplication
- Use explicit column names for clarity
- Test individual queries before combining them
- Don't use SELECT * in UNION queries
- Avoid unnecessary UNION DISTINCT; it's slower than UNION ALL
ORDER BY and LIMIT
Sort and paginate result sets
ORDER BY sorts the rows in your result based on one or more columns. By default, sorting is in ascending order (smallest to largest, A to Z). Add DESC after the column name to sort in descending order (largest to smallest, Z to A).
Restricting Row Counts
LIMIT controls how many rows to return, which is essential for pagination and "top N" queries.
LIMIT restricts the number of rows returned. It is often used with ORDER BY to get the "top N" results, such as the top 5 customers or the 10 most recent orders.
This returns only the top 3 customers by total spending. Without LIMIT, you would see all customers.
Check Your Understanding
See if you can combine ORDER BY and LIMIT to answer a common business question.
> Complete this query to find the five most expensive products in the catalog.
SELECT product_name, price FROM products price
Combining ORDER BY with LIMIT is the standard pattern for "top N" queries, such as finding the five most expensive products or the ten most recent sign-ups.
Without ORDER BY, the database can return rows in any order it chooses, so LIMIT alone does not guarantee you get the highest or lowest values. It simply returns whichever rows the database finds first.
In large datasets, ORDER BY before LIMIT is much more efficient than sorting the entire result after the fact, because the database can often use an index to avoid scanning every row.
> You are a data engineer at Spotify building a weekly active-artist report that combines streaming data from multiple regional databases and surfaces the top performers to the label partnerships team. The pipeline must be readable, maintainable, and return a clean ranked list.
WITH clause name each processing stage like regional_streams or deduplicated_plays so teammates can read and debug the pipeline step by step.SELECT. Understanding this prevents alias errors and ensures HAVING is used instead of WHERE for post-aggregation filters.UNION ALL merges artist streaming datasets from separate regional tables into one combined result set without expensive deduplication.ORDER BY and LIMIT sort artists by total streams descending and cap the output to the top 20 for the partnerships report.WITH clause) create named temporary datasets that improve readability and allow reuse, functioning as the SQL equivalent of naming an intermediate variable in codeUNION combines result sets and removes duplicates; UNION ALL keeps all rows including duplicatesORDER BY sorts results by column values; add DESC for descending orderLIMIT restricts the number of rows returned, often used with ORDER BY for "top N" queriesCTEs: subqueries are a cry for help
- Category
- SQL
- Difficulty
- intermediate
- Duration
- 28 minutes
- Challenges
- 0 hands-on challenges
Topics covered: CTEs (WITH clause), Query Execution Order, Subqueries for temp results, UNION and UNION ALL, ORDER BY and LIMIT
Lesson Sections
- CTEs (WITH clause) (concepts: sqlCte)
A CTE is a named result set you define once at the top of a query, then reference by name anywhere below it. This works exactly the way you assign an intermediate variable in code before using it in a calculation. The filter logic lives inside the CTE, the main query reads from the named result, and the two steps stay cleanly separated. If you have ever computed something once to avoid repeating the same expression twice, you already understand what a CTE does. CTE Usage Examples The following e
- Query Execution Order (concepts: sqlExecutionOrder)
Understanding Execution Order Walk through this example to see how the CTE executes before the outer query, not after it. Basic Example This example shows how a CTE executes first to filter logs, then the outer query applies an additional filter on the result. The CTE executes first to produce the temporary dataset. The outer query then filters from it. Practical Implications Understanding execution order helps you avoid common pitfalls like referencing aliases before they exist. Real-World Exam
- Subqueries for temp results (concepts: sqlSubqueryScalar)
Subquery Types and Patterns Table Subquery Example Scalar Subquery The subquery retrieves a single value. Each row in the outer query shows the same default name. Multi-row Subquery The subquery lists all active user IDs. The outer query selects orders linked to those users. Practice: Subquery Filter Real-World Example Suppose you need to monitor which devices have many error events. First, count the number of errors per device, then select devices exceeding a certain threshold. Subqueries allow
- UNION and UNION ALL (concepts: sqlUnion)
UNION Comparison and Examples Basic Example UNION (aka UNION DISTINCT) UNION ALL Duplicates are preserved. Faster because no deduplication is done. Arrange the UNION Query Practical Usage Real-World Example The performance difference between these two variants is dramatic at scale.
- ORDER BY and LIMIT (concepts: sqlOrderBy, sqlLimit)
Control the order and quantity of rows returned with these essential clauses that shape your final output. This returns users sorted by signup date, with the most recent signups appearing first. Restricting Row Counts Check Your Understanding