Filtering: Advanced

Visa processes over 200 billion card transactions per year, and their fraud detection system flags suspicious activity in under 100 milliseconds per swipe. That real-time detection works by running correlated subqueries and EXISTS checks against a cardholder's transaction history to identify patterns that do not match normal behavior, like a card being used in two countries within minutes or a sudden spike in high-value purchases. The advanced filtering techniques in this lesson, including correlated subqueries, EXISTS, and complex filter combinations, are the same class of tools powering fraud systems at Visa, Mastercard, and every major payment network.

Correlated subqueries (EXISTS)

Daily Life
Interviews

Check if related records exist for each row

EXISTS answers relational questions by checking whether related rows exist in another table.

EXISTS evaluates whether a subquery returns at least one row. Because the subquery can reference the outer query, it is perfect for asking relational questions such as "does this order have a recent payment?" or "has this user submitted any tickets?"

1SELECT
2 o.order_id,
3 o.customer_id
4FROM orders AS o
5WHERE EXISTS (
6 SELECT
7 1
8 FROM transactions AS p
9 WHERE p.order_id = o.order_id
10 AND p.settled_at >= '2024-05-01'
11)
Result
order_idcustomer_id
ORD-204cust_104
ORD-207cust_317
ORD-219cust_317

The correlated WHERE clause reuses the outer order_id, so each order performs its own lookup. As soon as the subquery finds a qualifying payment row it stops scanning and returns true, making EXISTS efficient even on large datasets.

Watch how each order probes the subquery: the outer row expands, the payments table responds, and only rows with a qualifying payment move into the result.
1SELECT
2 o.order_id,
3 o.customer_id
4FROM orders AS o
5WHERE EXISTS (
6 SELECT
7 1
8 FROM transactions AS p
9 WHERE p.order_id = o.order_id
10 AND p.settled_at >= '2024-05-01'
11)
Each outer row probes the subquery independently. As soon as a qualifying payment is found, the subquery stops scanning and the order passes through. Orders with no matching payment are excluded entirely.

EXISTS Efficiency

EXISTS is optimized for performance because it stops searching as soon as it finds a single matching row.

How EXISTS evaluates
  • Runs once per row of the outer query and short-circuits on the first match
  • Only needs SELECT 1 inside because the columns list does not matter
  • Pair with LIMIT/OFFSET sparingly; EXISTS only cares about presence, not counts
Which subquery approach checks for related rows more efficiently?
JOIN + DISTINCT works but builds a full intermediate result set before deduplicating, which is wasteful when you only need to know if a match exists.
1SELECT DISTINCT
2 o.order_id
3FROM orders AS o
4INNER JOIN transactions AS p
5 ON p.order_id = o.order_id
6WHERE p.settled_at >= '2024-05-01'

Now try writing an EXISTS subquery to check for related payment records.

> Complete this query to find users that have at least one delivered push notification.

SELECT
  u.user_id
FROM users AS u
WHERE ___ (
  SELECT
    1
  FROM push_notifs AS p
  WHERE ___ = u.user_id
  AND p.status = ___
)
'failed'
EXISTS
'delivered'
NOT EXISTS
p.user_id

EXISTS is one of the most powerful filtering tools in SQL because it checks relationships without retrieving any column data.

When you only need to know whether related rows exist rather than what those rows contain, EXISTS gives you both correctness and performance.

In production systems, EXISTS-based queries often outperform equivalent JOIN-based approaches because the subquery can short-circuit as soon as the first matching row is found.

NOT EXISTS for missing rows

Daily Life
Interviews

Find records with no matching partner

NOT EXISTS is the safest way to find rows that lack related data in another table.

NOT EXISTS flips the logic to keep outer rows that do not find a match inside the subquery. This is the safest pattern for enforcing "no related rows" rules without being tripped up by NULL handling.

1SELECT
2 c.customer_id,
3 c.company
4FROM customers AS c
5WHERE NOT EXISTS (
6 SELECT
7 1
8 FROM support_tickets AS t
9 WHERE t.customer_id = c.customer_id
10 AND t.status = 'open'
11)
Result
customer_idcompany
cust_041Orbit Bikes
cust_107Pioneer Labs
cust_134Nova Freight

Here each customer keeps its spot only when no open ticket rows exist for the same customer_id. Because EXISTS ignores column values, you can add more predicates (date ranges, severities) without changing the correlation pattern.

Common Use Cases

NOT EXISTS excels at detecting missing data, gaps in processes, and orphaned records.

Missing child rows
Detect unpaid invoices, onboarding gaps, or stale enrollments
NULL-safe by design
Guarantees correct behavior even when the subquery returns NULL values
Self-documenting intent
Outer alias references make the relationship logic obvious to readers
TIP
NOT EXISTS is the safest "anti-join" pattern. Unlike NOT IN, it handles NULLs correctly and unlike LEFT JOIN + IS NULL, it clearly communicates intent.
Which query correctly finds customers with no open tickets?
This LEFT JOIN has a bug: filtering on t.status = 'open' in WHERE converts the outer join to an inner join, removing customers without any tickets. The IS NULL check never triggers.
1SELECT
2 c.customer_id
3FROM customers AS c
4LEFT JOIN support_tickets AS t
5 ON t.customer_id = c.customer_id
6WHERE t.status = 'open'
7AND t.ticket_id IS NULL

Practice using NOT IN to exclude specific categories from a product listing.

> Complete this query to show only products that are not archived or discontinued.

SELECT
  name
FROM products
WHERE category ___ ___ (
  'archived',
  'discontinued'
  )
LIKE
IN
NOT
EXISTS

NOT EXISTS is the safest anti-join pattern because it handles NULL values correctly by design, unlike NOT IN which can silently return zero rows when the subquery contains a NULL.

Choosing between NOT IN and NOT EXISTS is one of the most common correctness decisions in SQL. When in doubt, prefer NOT EXISTS for its predictable NULL behavior.

The pattern of correlated subqueries with EXISTS and NOT EXISTS scales well because modern query optimizers convert them into efficient hash or merge anti-join operations.

NOT IN vs NULL gotchas

Daily Life
Interviews

Avoid the NULL trap in exclusion filters

NOT IN has a dangerous quirk: a single NULL in the subquery can silently eliminate all results.

NOT IN is concise for enumerating disallowed values, but remember that a single NULL inside the list makes the entire comparison UNKNOWN and filters out every row. The fix is to filter NULLs inside the subquery (or use NOT EXISTS).

1SELECT
2 user_id
3FROM logins
4WHERE user_id NOT IN (
5 SELECT
6 user_id
7 FROM risk_allowlist
8 WHERE retired_at IS NULL
9 AND user_id IS NOT NULL
10)
Result
user_id
U003
U007
U012

NOT IN vs NOT EXISTS

Both exclude rows, but they handle NULLs very differently. Choosing the right one prevents silent bugs.

NOT IN
  • Concise for small value lists
  • Breaks silently if subquery returns NULL
  • Must guard with IS NOT NULL filter
NOT EXISTS
  • NULL-safe by default
  • Handles composite keys cleanly
  • Self-documenting with correlated aliases

Which NOT IN query is safe when the subquery might contain NULLs?

If risk_allowlist contains even one NULL user_id, every comparison returns UNKNOWN and the entire query silently returns zero rows.
1SELECT
2 user_id
3FROM logins
4WHERE user_id NOT IN (
5 SELECT
6 user_id
7 FROM risk_allowlist
8)
The Compliance FilterStep 1
>

Flagged transactions feed directly into regulatory reports. Every false positive triggers a manual review costing $40, and every missed flag risks a $10K fine.

transactions
txn_idmerchant_idamounttxn_datestatus
TXN-001M-204149.992024-06-01settled
TXN-002M-99975.002024-06-01pending
TXN-003M-204220.502024-06-02settled
merchants
merchant_idnamecategoryonboarded_at
M-204Orbit Bikesretail2023-11-15
M-317Pioneer Labstech2024-01-20
M-410Nova Freightlogistics2024-03-05
risk_allowlist
merchant_idadded_byreasonretired_at
M-204complianceverified partnerNULL
M-317complianceaudit clearedNULL
May 2026
Missing-Relationship Detection

Your first task is to flag transactions whose merchant_id has no matching merchant record. Marketing just onboarded 200 new merchants, but some transactions arrived before merchant profiles were created. Which filtering approach do you use?

SQL
-- Goal: find transactions with no merchant record
SELECT txn_id, merchant_id, amount
FROM transactions t
WHERE ??? -- your approach here

NOT EXISTS is the safest anti-join pattern because it handles NULL values correctly by design, unlike NOT IN which can silently return zero rows when the subquery contains a NULL.

Choosing between NOT IN and NOT EXISTS is one of the most common correctness decisions in SQL. When in doubt, prefer NOT EXISTS for its predictable NULL behavior.

The pattern of correlated subqueries with EXISTS and NOT EXISTS scales well because modern query optimizers convert them into efficient hash or merge anti-join operations.

REGEXP_LIKE patterns

Daily Life
Interviews

Match text with regular expressions

When LIKE wildcards are not expressive enough, regular expressions provide precise pattern matching.

When LIKE patterns become brittle, regex functions provide far more control. Most analytic warehouses expose REGEXP or REGEXP_LIKE, which evaluate a full regular expression against each value. Use them sparingly because they are powerful but harder to read.

1SELECT
2 email,
3 signup_channel
4FROM marketing_signups
5WHERE REGEXP_LIKE(email, '^.+@(orbit|apollo)\.ai$')
6AND signup_channel = 'webinar'
Result
emailsignup_channel
maria@orbit.aiwebinar
logan@apollo.aiwebinar
This expression keeps webinar signups whose email domains are orbit.ai or apollo.ai. Regex anchors (^ and $) guarantee full-string matches so you do not pull in typos like orbit.ai.fake.

Regex Fundamentals

Understanding regex concepts helps you write precise patterns that match exactly what you intend.
//

What is Regex?

The core idea is pattern matching: instead of searching for a literal string, you describe the shape of what you want. A regex pattern is a mini-program that the engine executes character by character against each input value, checking whether it matches the specified structure.

In SQL analytics, regex fills the gap between simple LIKE wildcards and procedural code. When you need to validate email formats, extract substrings, enforce ID conventions, or detect patterns across millions of rows, regex is the right tool.

Which pattern correctly matches emails from orbit.ai or apollo.ai?
LIKE works here but the unescaped dot matches any character, so it would also accept "@orbitXai". Adding more domains requires more OR clauses.
1SELECT
2 email
3FROM marketing_signups
4WHERE email LIKE '%@orbit.ai'
5OR email LIKE '%@apollo.ai'
The examples below introduce these regex operators. Each operator has a specific purpose in describing pattern structure:
Anchors
^: Matches the start of the string. Use ^sku to ensure the value begins with "sku".
$: Matches the end of the string. Use \.ai$ to require values ending in ".ai".
Character Matchers
.: Matches any single character except newline.
\d: Matches any digit (0-9). Equivalent to [0-9].
\w: Matches any word character (letters, digits, underscore).
\s: Matches any whitespace character.
\S: Matches any non-whitespace character.
Character Classes
[abc]: Matches any single character in the set.
[A-Z]: Matches any character in the range.
[^abc]: Negated class - matches any character NOT in the set.
Quantifiers
*: Matches zero or more of the preceding element.
+: Matches one or more of the preceding element.
?: Matches zero or one of the preceding element.
{n}: Matches exactly n occurrences.
{n,m}: Matches between n and m occurrences.
Grouping and Alternation
(...): Groups characters together.
a|b: Alternation - matches either a or b.
\1, \2: Backreferences - matches the same text captured by group 1 or 2.
Escaping
\.: Matches a literal period character.
\(, \), \[, \]: Escape special characters to match them literally.
Advanced Features
(?=...): Positive lookahead - asserts that what follows matches the pattern.
(?i): Inline case-insensitive flag.

Read vs Write Patterns

Breaking down patterns character by character helps you understand and debug regex expressions.
//

Reading Regex Patterns

Regex patterns are read left to right. Each operator constrains what the next character (or group of characters) must be. Here are four progressively complex patterns with plain English translations:
^[A-Z]{2}$

Exactly two uppercase letters. Matches state codes like "CA" or "NY" but rejects "California" or "ca".

^\d{3}-\d{4}$

Three digits, a hyphen, four digits. Matches "555-1234" but rejects "5551234" or "55-12345".

^[A-Za-z]+@[A-Za-z]+\.[A-Za-z]{2,}$

Simplified email: letters, @, letters, dot, 2+ letters. Matches "user@company.com" but rejects "user@company".

^(sku|SKU)-[0-9]{4,6}(-[A-Z]{2})?$

SKU format: "sku" or "SKU", hyphen, 4-6 digits, optional region suffix. Matches "sku-1234", "SKU-123456", or "sku-5678-US".

//

Common Regex Mistakes

Regex syntax is strict and order matters. Here are patterns that look plausible but fail:
$^[A-Z]{2}

Anchors are reversed. The $ must come at the end and ^ at the start.

^[A-Z]+{2}$

Quantifier collision. + already means "one or more", so adding {2} creates invalid syntax.

^user@company.com$

Unescaped dot. The period matches any character. Use \. to match a literal period.

[0-9]{4}

Missing anchors. Without ^ and $, this matches any string containing four consecutive digits anywhere.

1. Anchor the entire value

Use ^ and $ to insist on an exact format such as sku-####.

2. Alternation groups

Alternation (orbit|apollo) keeps multi-domain filters expressive.

3. Optional segments

Optional tokens (colou?r) support regional spelling variations.

4. Quantifiers for exact counts

Quantifiers such as {2} or {3} demand precise lengths.

5. Character classes

Character classes ([A-F0-9]) validate constrained alphabets.

6. Negated classes

Negated classes (\S) block whitespace or forbidden characters.

7. Word boundaries

Word-boundary checks ensure you match standalone terms.

8. Lookarounds

Lookarounds assert future requirements, ideal for password gates.

9. Backreferences

Backreferences ((\w{2})\1) enforce repeated structure.

10. Inline flags

Inline flags ((?i)) toggle case sensitivity per expression.

> Complete this query to find products whose SKU follows the exact format "sku-" followed by four digits.

SELECT
  sku,
  name
FROM products
WHERE ___(sku, ___)
LIKE
REGEXP_LIKE
'^sku-[0-9]+$'
'^sku-[0-9]{4}$'
Anchoring regex patterns with the start and end markers is the most important habit to build because unanchored patterns can match substrings anywhere in a value, producing false positives.
Regular expressions are evaluated row by row, so keeping them as specific as possible reduces the risk of unintended matches across large datasets.

When you find yourself adding more OR clauses to a LIKE condition to cover additional cases, that is a reliable signal that REGEXP_LIKE will produce cleaner, more maintainable code.

Regex operators and patterns

Daily Life
Interviews

Build reusable text validation rules

Each regex operator has a specific purpose in constructing patterns that match text.
Regex operators are building blocks that describe what characters to match and how many times. They do not filter or return text on their own.

You combine them into a pattern string that REGEXP_LIKE evaluates against each row. If the pattern matches the cell value, the function returns true and the row passes the WHERE clause; otherwise it returns false and the row is filtered out.

TIP
Start with simple patterns using anchors and character classes. Add quantifiers and groups only when the simpler version does not capture what you need.

> Complete this query to find email addresses that match a valid format using regex.

SELECT
  ___
FROM users
WHERE ___(
  email,
  '^[a-zA-Z0-9.]+@[a-zA-Z0-9]+\\.[a-z]{2,}$'
  )
REGEXP_LIKE
email
username
LIKE
Do
  • Use EXISTS for correlated existence checks since it short-circuits
  • Prefer NOT EXISTS over NOT IN when NULLs may be present
  • Guard NOT IN subqueries with IS NOT NULL filters
  • Anchor regex patterns with ^ and $ for exact format matching
  • Document complex regex patterns for maintainability
Don't
  • Don't use NOT IN when the subquery can return NULL values
  • Don't use REGEXP_LIKE for simple pattern matching where LIKE suffices
  • Don't forget that correlated subqueries execute once per outer row
  • Don't assume regex syntax is identical across databases
Start with simple regex patterns using anchors and character classes before adding quantifiers and groups.

When a LIKE pattern with multiple OR conditions becomes hard to read, switching to REGEXP_LIKE usually produces cleaner code.

PUTTING IT ALL TOGETHER

> You are a fraud detection engineer at Stripe building a daily pipeline that flags suspicious accounts before they can initiate payouts. The detection logic must catch accounts exhibiting known bad patterns across transaction history, missing verification records, and malformed identifier formats.

Correlated subqueries with EXISTS check each account row against the transactions table to confirm it has made high-velocity charges within a short time window.
NOT EXISTS safely identifies accounts with no verified identity record in the verifications table without the NULL-matching pitfalls of NOT IN.
NOT IN vs NULL gotchas matter when filtering against a subquery of blocked merchant IDs, requiring IS NOT NULL guards to prevent silent result set collapse.
REGEXP_LIKE validates that payout routing numbers and account identifiers match the exact required format, catching structurally malformed values before they reach the payment processor.
KEY TAKEAWAYS
EXISTS runs once per outer row and short-circuits on the first match for efficiency
Correlated subqueries reference outer query values, enabling row-by-row relationship checks
NOT EXISTS safely detects missing relationships without NULL pitfalls
NOT IN returns no rows if the subquery contains any NULL values
Guard NOT IN with IS NOT NULL filters, or prefer NOT EXISTS for NULL safety
REGEXP_LIKE evaluates full regular expressions for complex pattern matching
Anchor regex patterns with ^ and $ to ensure exact format matches
Document complex regex patterns so teammates can maintain them

Filtering: Advanced

Subqueries, regex, and other crimes

Category
SQL
Difficulty
advanced
Duration
19 minutes
Challenges
0 hands-on challenges

Topics covered: Correlated subqueries (EXISTS), NOT EXISTS for missing rows, NOT IN vs NULL gotchas, REGEXP_LIKE patterns, Regex operators and patterns

Lesson Sections

  1. Correlated subqueries (EXISTS) (concepts: sqlExists)

    Watch how each order probes the subquery: the outer row expands, the payments table responds, and only rows with a qualifying payment move into the result. Each outer row probes the subquery independently. As soon as a qualifying payment is found, the subquery stops scanning and the order passes through. Orders with no matching payment are excluded entirely. EXISTS Efficiency Which subquery approach checks for related rows more efficiently?

  2. NOT EXISTS for missing rows

    Common Use Cases Which query correctly finds customers with no open tickets?

  3. NOT IN vs NULL gotchas

    NOT IN vs NOT EXISTS

  4. REGEXP_LIKE patterns (concepts: sqlRegexMatch)

    This expression keeps webinar signups whose email domains are orbit.ai or apollo.ai. Regex anchors (^ and $) guarantee full-string matches so you do not pull in typos like orbit.ai.fake. Regex Fundamentals Understanding regex concepts helps you write precise patterns that match exactly what you intend. What is Regex? The core idea is pattern matching: instead of searching for a literal string, you describe the shape of what you want. A regex pattern is a mini-program that the engine executes cha

  5. Regex operators and patterns

    Each regex operator has a specific purpose in constructing patterns that match text. Regex operators are building blocks that describe what characters to match and how many times. They do not filter or return text on their own. Start with simple regex patterns using anchors and character classes before adding quantifiers and groups.