Filtering: Advanced
Visa processes over 200 billion card transactions per year, and their fraud detection system flags suspicious activity in under 100 milliseconds per swipe. That real-time detection works by running correlated subqueries and EXISTS checks against a cardholder's transaction history to identify patterns that do not match normal behavior, like a card being used in two countries within minutes or a sudden spike in high-value purchases. The advanced filtering techniques in this lesson, including correlated subqueries, EXISTS, and complex filter combinations, are the same class of tools powering fraud systems at Visa, Mastercard, and every major payment network.
Correlated subqueries (EXISTS)
Check if related records exist for each row
EXISTS answers relational questions by checking whether related rows exist in another table.
EXISTS evaluates whether a subquery returns at least one row. Because the subquery can reference the outer query, it is perfect for asking relational questions such as "does this order have a recent payment?" or "has this user submitted any tickets?"
The correlated WHERE clause reuses the outer order_id, so each order performs its own lookup. As soon as the subquery finds a qualifying payment row it stops scanning and returns true, making EXISTS efficient even on large datasets.
EXISTS Efficiency
EXISTS is optimized for performance because it stops searching as soon as it finds a single matching row.
- Runs once per row of the outer query and short-circuits on the first match
- Only needs
SELECT1 inside because the columns list does not matter - Pair with
LIMIT/OFFSETsparingly;EXISTSonly cares about presence, not counts
Now try writing an EXISTS subquery to check for related payment records.
> Complete this query to find users that have at least one delivered push notification.
SELECT u.user_id FROM users AS u WHERE ( SELECT 1 FROM push_notifs AS p WHERE = u.user_id AND p.status = )
EXISTS is one of the most powerful filtering tools in SQL because it checks relationships without retrieving any column data.
When you only need to know whether related rows exist rather than what those rows contain, EXISTS gives you both correctness and performance.
In production systems, EXISTS-based queries often outperform equivalent JOIN-based approaches because the subquery can short-circuit as soon as the first matching row is found.
NOT EXISTS for missing rows
Find records with no matching partner
NOT EXISTS is the safest way to find rows that lack related data in another table.
NOT EXISTS flips the logic to keep outer rows that do not find a match inside the subquery. This is the safest pattern for enforcing "no related rows" rules without being tripped up by NULL handling.
Here each customer keeps its spot only when no open ticket rows exist for the same customer_id. Because EXISTS ignores column values, you can add more predicates (date ranges, severities) without changing the correlation pattern.
Common Use Cases
NOT EXISTS excels at detecting missing data, gaps in processes, and orphaned records.
NOT EXISTS is the safest "anti-join" pattern. Unlike NOT IN, it handles NULLs correctly and unlike LEFT JOIN + IS NULL, it clearly communicates intent.Practice using NOT IN to exclude specific categories from a product listing.
> Complete this query to show only products that are not archived or discontinued.
SELECT name FROM products WHERE category ( 'archived', 'discontinued' )
NOT EXISTS is the safest anti-join pattern because it handles NULL values correctly by design, unlike NOT IN which can silently return zero rows when the subquery contains a NULL.
Choosing between NOT IN and NOT EXISTS is one of the most common correctness decisions in SQL. When in doubt, prefer NOT EXISTS for its predictable NULL behavior.
The pattern of correlated subqueries with EXISTS and NOT EXISTS scales well because modern query optimizers convert them into efficient hash or merge anti-join operations.
NOT IN vs NULL gotchas
Avoid the NULL trap in exclusion filters
NOT IN has a dangerous quirk: a single NULL in the subquery can silently eliminate all results.
NOT IN is concise for enumerating disallowed values, but remember that a single NULL inside the list makes the entire comparison UNKNOWN and filters out every row. The fix is to filter NULLs inside the subquery (or use NOT EXISTS).
NOT IN vs NOT EXISTS
Both exclude rows, but they handle NULLs very differently. Choosing the right one prevents silent bugs.
- Concise for small value lists
- Breaks silently if subquery returns NULL
- Must guard with IS NOT NULL filter
- NULL-safe by default
- Handles composite keys cleanly
- Self-documenting with correlated aliases
Which NOT IN query is safe when the subquery might contain NULLs?
Flagged transactions feed directly into regulatory reports. Every false positive triggers a manual review costing $40, and every missed flag risks a $10K fine.
| txn_id | merchant_id | amount | txn_date | status |
|---|---|---|---|---|
| TXN-001 | M-204 | 149.99 | 2024-06-01 | settled |
| TXN-002 | M-999 | 75.00 | 2024-06-01 | pending |
| TXN-003 | M-204 | 220.50 | 2024-06-02 | settled |
| merchant_id | name | category | onboarded_at |
|---|---|---|---|
| M-204 | Orbit Bikes | retail | 2023-11-15 |
| M-317 | Pioneer Labs | tech | 2024-01-20 |
| M-410 | Nova Freight | logistics | 2024-03-05 |
| merchant_id | added_by | reason | retired_at |
|---|---|---|---|
| M-204 | compliance | verified partner | NULL |
| M-317 | compliance | audit cleared | NULL |
Your first task is to flag transactions whose merchant_id has no matching merchant record. Marketing just onboarded 200 new merchants, but some transactions arrived before merchant profiles were created. Which filtering approach do you use?
-- Goal: find transactions with no merchant record SELECT txn_id, merchant_id, amount FROM transactions t WHERE ??? -- your approach here
NOT EXISTS is the safest anti-join pattern because it handles NULL values correctly by design, unlike NOT IN which can silently return zero rows when the subquery contains a NULL.
Choosing between NOT IN and NOT EXISTS is one of the most common correctness decisions in SQL. When in doubt, prefer NOT EXISTS for its predictable NULL behavior.
The pattern of correlated subqueries with EXISTS and NOT EXISTS scales well because modern query optimizers convert them into efficient hash or merge anti-join operations.
REGEXP_LIKE patterns
Match text with regular expressions
When LIKE wildcards are not expressive enough, regular expressions provide precise pattern matching.
When LIKE patterns become brittle, regex functions provide far more control. Most analytic warehouses expose REGEXP or REGEXP_LIKE, which evaluate a full regular expression against each value. Use them sparingly because they are powerful but harder to read.
Regex Fundamentals
What is Regex?
In SQL analytics, regex fills the gap between simple LIKE wildcards and procedural code. When you need to validate email formats, extract substrings, enforce ID conventions, or detect patterns across millions of rows, regex is the right tool.
Read vs Write Patterns
Reading Regex Patterns
^[A-Z]{2}$Exactly two uppercase letters. Matches state codes like "CA" or "NY" but rejects "California" or "ca".
^\d{3}-\d{4}$Three digits, a hyphen, four digits. Matches "555-1234" but rejects "5551234" or "55-12345".
^[A-Za-z]+@[A-Za-z]+\.[A-Za-z]{2,}$Simplified email: letters, @, letters, dot, 2+ letters. Matches "user@company.com" but rejects "user@company".
^(sku|SKU)-[0-9]{4,6}(-[A-Z]{2})?$SKU format: "sku" or "SKU", hyphen, 4-6 digits, optional region suffix. Matches "sku-1234", "SKU-123456", or "sku-5678-US".
Common Regex Mistakes
$^[A-Z]{2}Anchors are reversed. The $ must come at the end and ^ at the start.
^[A-Z]+{2}$Quantifier collision. + already means "one or more", so adding {2} creates invalid syntax.
^user@company.com$Unescaped dot. The period matches any character. Use \. to match a literal period.
[0-9]{4}Missing anchors. Without ^ and $, this matches any string containing four consecutive digits anywhere.
Use ^ and $ to insist on an exact format such as sku-####.
Alternation (orbit|apollo) keeps multi-domain filters expressive.
Optional tokens (colou?r) support regional spelling variations.
Quantifiers such as {2} or {3} demand precise lengths.
Character classes ([A-F0-9]) validate constrained alphabets.
Negated classes (\S) block whitespace or forbidden characters.
Word-boundary checks ensure you match standalone terms.
Lookarounds assert future requirements, ideal for password gates.
Backreferences ((\w{2})\1) enforce repeated structure.
Inline flags ((?i)) toggle case sensitivity per expression.
> Complete this query to find products whose SKU follows the exact format "sku-" followed by four digits.
SELECT sku, name FROM products WHERE (sku, )
When you find yourself adding more OR clauses to a LIKE condition to cover additional cases, that is a reliable signal that REGEXP_LIKE will produce cleaner, more maintainable code.
Regex operators and patterns
Build reusable text validation rules
You combine them into a pattern string that REGEXP_LIKE evaluates against each row. If the pattern matches the cell value, the function returns true and the row passes the WHERE clause; otherwise it returns false and the row is filtered out.
> Complete this query to find email addresses that match a valid format using regex.
SELECT FROM users WHERE ( email, '^[a-zA-Z0-9.]+@[a-zA-Z0-9]+\\.[a-z]{2,}$' )
- Use EXISTS for correlated existence checks since it short-circuits
- Prefer NOT EXISTS over NOT IN when NULLs may be present
- Guard NOT IN subqueries with IS NOT NULL filters
- Anchor regex patterns with ^ and $ for exact format matching
- Document complex regex patterns for maintainability
- Don't use NOT IN when the subquery can return NULL values
- Don't use REGEXP_LIKE for simple pattern matching where LIKE suffices
- Don't forget that correlated subqueries execute once per outer row
- Don't assume regex syntax is identical across databases
When a LIKE pattern with multiple OR conditions becomes hard to read, switching to REGEXP_LIKE usually produces cleaner code.
> You are a fraud detection engineer at Stripe building a daily pipeline that flags suspicious accounts before they can initiate payouts. The detection logic must catch accounts exhibiting known bad patterns across transaction history, missing verification records, and malformed identifier formats.
EXISTS check each account row against the transactions table to confirm it has made high-velocity charges within a short time window.NOT EXISTS safely identifies accounts with no verified identity record in the verifications table without the NULL-matching pitfalls of NOT IN.NOT IN vs NULL gotchas matter when filtering against a subquery of blocked merchant IDs, requiring IS NOT NULL guards to prevent silent result set collapse.REGEXP_LIKE validates that payout routing numbers and account identifiers match the exact required format, catching structurally malformed values before they reach the payment processor.EXISTS runs once per outer row and short-circuits on the first match for efficiencyNOT EXISTS safely detects missing relationships without NULL pitfallsNOT IN returns no rows if the subquery contains any NULL valuesNOT IN with IS NOT NULL filters, or prefer NOT EXISTS for NULL safetyREGEXP_LIKE evaluates full regular expressions for complex pattern matchingSubqueries, regex, and other crimes
- Category
- SQL
- Difficulty
- advanced
- Duration
- 19 minutes
- Challenges
- 0 hands-on challenges
Topics covered: Correlated subqueries (EXISTS), NOT EXISTS for missing rows, NOT IN vs NULL gotchas, REGEXP_LIKE patterns, Regex operators and patterns
Lesson Sections
- Correlated subqueries (EXISTS) (concepts: sqlExists)
Watch how each order probes the subquery: the outer row expands, the payments table responds, and only rows with a qualifying payment move into the result. Each outer row probes the subquery independently. As soon as a qualifying payment is found, the subquery stops scanning and the order passes through. Orders with no matching payment are excluded entirely. EXISTS Efficiency Which subquery approach checks for related rows more efficiently?
- NOT EXISTS for missing rows
Common Use Cases Which query correctly finds customers with no open tickets?
- NOT IN vs NULL gotchas
NOT IN vs NOT EXISTS
- REGEXP_LIKE patterns (concepts: sqlRegexMatch)
This expression keeps webinar signups whose email domains are orbit.ai or apollo.ai. Regex anchors (^ and $) guarantee full-string matches so you do not pull in typos like orbit.ai.fake. Regex Fundamentals Understanding regex concepts helps you write precise patterns that match exactly what you intend. What is Regex? The core idea is pattern matching: instead of searching for a literal string, you describe the shape of what you want. A regex pattern is a mini-program that the engine executes cha
- Regex operators and patterns
Each regex operator has a specific purpose in constructing patterns that match text. Regex operators are building blocks that describe what characters to match and how many times. They do not filter or return text on their own. Start with simple regex patterns using anchors and character classes before adding quantifiers and groups.