Complex Data: Advanced

DoorDash ingests millions of raw JSON events from their mobile apps every minute, and their data team uses Snowflake's VARIANT type to land those payloads directly into the warehouse without upfront schema definition. From there, engineers query deeply nested fields like restaurant geofence coordinates buried inside delivery event objects using the same JSON path expressions as structured columns, with no performance penalty for the nesting. When a new event field appears in the mobile SDK, it becomes queryable immediately without a migration. The document-relational techniques in this lesson are the foundation of that kind of flexible, high-performance data architecture.

TRANSFORM for Array Mapping

Daily Life
Interviews

Apply functions to every array element

TRANSFORM(array, lambda) applies a function to every element in an array, returning a new array with the transformed values. This is conceptually similar to map() in functional programming.

Lambda Syntax

Lambda functions in SQL use arrow notation: element -> expression. The element name is arbitrary and scoped to the lambda body.
1SELECT
2 TRANSFORM(
3 ARRAY[1, 2, 3, 4, 5],
4 x -> x * 2
5 ) AS doubled
Result
doubled
[2, 4, 6, 8, 10]

The lambda x -> x * 2 takes each element x and returns x multiplied by 2. The result is a new array with each original element doubled.

String Transformations

Lambda expressions can apply string functions to every element:
1SELECT
2 TRANSFORM(
3 tags,
4 tag -> UPPER(tag)
5 ) AS uppercase_tags
6FROM articles
Result
uppercase_tags
[SQL, DATA, TUTORIAL]
[PYTHON, MACHINE-LEARNING]
Each tag is converted to uppercase. This pattern is useful for normalizing user-generated content before analysis.
Apply mathematical operations like percentages, rounding, or unit conversions.
1SELECT
2 TRANSFORM(prices, x -> x * 0.9) AS discounted
3FROM products

Complex Expressions

Lambda bodies can contain complex expressions including CASE statements, nested functions, and conditionals:

1SELECT
2 TRANSFORM(
3 prices,
4 p -> CASE
5 WHEN p > 100 THEN p * 0.9
6 ELSE p
7 END
8 ) AS discounted_prices
9FROM products
Result
discounted_prices
[45.0, 90.0, 67.5]
[120.0, 150.0]

This applies a 10% discount to prices over 100. The CASE expression evaluates for each array element independently.

TIP
TRANSFORM does not modify the original array. It creates a new array with the transformed values. Use it when you need to apply the same operation to every element without changing the array length.
Lambda expressions like this are the foundation of in-array transformations. The original array remains unchanged, and the result is a new array ready for downstream use.

You can chain TRANSFORM with other array functions. For example, applying TRANSFORM first to normalize values, then ARRAY_JOIN to produce a display string.

Remember that every element in the result array corresponds to exactly one element in the input array. TRANSFORM never changes the length of the array.

FILTER and REDUCE

Daily Life
Interviews

Select and combine array values with logic

Array manipulation goes beyond reading elements. You can filter, reduce, and aggregate array contents directly in SQL.

FILTER for Conditional Rows

FILTER(array, lambda) returns a new array containing only elements for which the lambda returns true. This is equivalent to a WHERE clause but for array elements.

1SELECT
2 FILTER(prices, p -> p > 50) AS expensive_prices
3FROM products
Result
expensive_prices
[75, 120, 99]
[60]

The lambda p -> p > 50 evaluates to true for prices greater than 50. Only those elements are included in the result array.

//

FILTER with CARDINALITY

A common pattern counts how many elements match a condition:
1SELECT
2 product_id,
3 CARDINALITY(
4 FILTER(reviews, r -> r >= 4)
5 ) AS positive_reviews
6FROM products
This filters the reviews array to include only ratings of 4 or higher, then counts the result. The original reviews array is unchanged.
Filter by numeric conditions like positive values, ranges, or thresholds.
1SELECT
2 FILTER(amounts, a -> a > 0) AS positive
3FROM transactions

REDUCE for Aggregation

REDUCE(array, initial, input_function, output_function) aggregates array elements into a single value using an accumulator pattern.

REDUCE has four components: the array, an initial accumulator value, a function that combines each element with the accumulator, and a final output function.
1SELECT
2 REDUCE(
3 ARRAY[1, 2, 3, 4, 5],
4 0,
5 (s, x) -> s + x,
6 s -> s
7 ) AS total

Starting with 0, the lambda (s, x) -> s + x adds each element to the running sum. The output function s -> s returns the final accumulator unchanged.

//

Calculating Weighted Avg

REDUCE enables complex aggregations like weighted averages:
1SELECT
2 REDUCE(
3 scores,
4 CAST(
5 ROW(0, 0)
6 AS ROW(sum DOUBLE, count INTEGER)
7 ),
8 (s, x) -> CAST(
9 ROW(s.sum + x, s.count + 1)
10 AS ROW(sum DOUBLE, count INTEGER)
11 ),
12 s -> s.sum / s.count
13 ) AS average
The accumulator is a struct with sum and count fields. Each iteration updates both fields. The output function divides sum by count to produce the average.
Classic reduction: accumulate by adding each element to the running total.
1SELECT
2 REDUCE(
3 nums,
4 0,
5 (s, x) -> s + x,
6 s -> s
7 ) AS total
8FROM data
TIP
Use FILTER when you need a subset of array elements. Use REDUCE when you need to aggregate array values into a single scalar result.

FILTER is often the first step in a processing pipeline. By removing unwanted elements early, subsequent TRANSFORM or REDUCE operations work on a smaller, cleaner dataset.

Unlike WHERE clauses that remove entire rows, FILTER operates within a single cell. The row stays in the result; only the array contents are reduced.

Combine FILTER with CARDINALITY to count matching elements without materializing the filtered array in a separate column.

UNNEST for Array Expansion

Daily Life
Interviews

Turn one row with an array into many rows

UNNEST expands an array into individual rows, enabling you to apply standard SQL operations to array elements. This is the opposite of array aggregation.

Basic UNNEST

1SELECT
2 order_id,
3 item
4FROM orders
5CROSS JOIN UNNEST(items) AS t(item)

The CROSS JOIN UNNEST pattern creates one row per array element. The original order_id repeats for each item, expanding 3 input rows into 6 output rows.

Filtering Unnested Data

Once unnested, you can apply WHERE clauses to filter individual elements:

1SELECT
2 order_id,
3 item
4FROM orders
5CROSS JOIN UNNEST(items) AS t(item)
6WHERE item LIKE '%premium%'
Result
order_iditem
1003premium widget
1005premium sensor
This filters the unnested items to include only those containing "premium". The result shows only matching items with their order IDs.

Aggregating Unnested Arrays

UNNEST enables aggregations across array elements from multiple rows:

1SELECT
2 COUNT(*) AS total_items,
3 COUNT(DISTINCT item) AS unique_items
4FROM orders
5CROSS JOIN UNNEST(items) AS t(item)
Result
total_itemsunique_items
14723
After unnesting, standard aggregation functions work on the expanded rows. This counts total item occurrences and unique items across all orders.

UNNEST vs Lambda Functions

Choosing between UNNEST and lambda functions depends on your use case:

OperationUNNESTLambda Functions
Row countExpands rowsPreserves rows
JoinsEasy to joinRequires subqueries
AggregationStandard SQL aggsNeed REDUCE
PerformanceMemory intensiveMore efficient
ReadabilityFamiliar SQLFunctional style
TIP
Use UNNEST when you need to join array elements with other tables or apply complex aggregations. Use lambda functions when you want to keep data in array form for downstream processing.

UNNEST multiplies rows: an array with five elements produces five output rows. Keep this expansion in mind when estimating the size of the result set.

After unnesting, you can group by the original row identifier to recollect elements, or apply aggregations across all elements from all rows together.

UNNEST is especially useful when you need to join array elements against another table, something that lambda functions cannot do without subqueries.

The Feature PipelineStep 1
>

An e-commerce company stores user clickstream data as arrays in their data warehouse. Each of 10 million user profiles contains arrays of clicked product IDs (avg 2,000 elements), session durations, and search queries. The ML team needs to extract 50+ features from these arrays for their recommendation model. Features include "count of premium product clicks," "average session duration for sessions over 30 seconds," and "most frequent search category." Each feature requires some combination of FILTER, TRANSFORM, and REDUCE.

user_behavior
user_idclicked_productsview_durationssearch_terms
u_42[p_101, p_205, p_101, ...][12, 45, 8, 120, ...][shoes, sale shoes, ...]
u_17[p_88, p_12, ...][90, 35, ...][laptop bag, ...]
May 2026
Strategy

The pipeline runs nightly and must produce 50+ numeric features per user. Each feature requires filtering, transforming, or reducing array elements. How do you structure the feature engineering SQL?

SQL
-- 10M users x 2,000 avg elements
-- 50+ features needed per user
-- Nightly batch, 4-hour SLA

Real pipelines require balancing lambda-based in-array processing against UNNEST-based row expansion. The right choice depends on downstream usage and data volume.

When arrays are small and transformations simple, lambdas keep the row structure intact and avoid the row explosion associated with UNNEST.

For complex aggregations or joins against external tables, UNNEST followed by GROUP BY is often the cleaner and more maintainable approach.

Advanced Map Operations

Daily Life
Interviews

Reshape complex nested data structures

Maps and arrays become most useful when you can convert them to display-friendly formats and build them from query results.

ARRAY_JOIN: String Concat

ARRAY_JOIN(array, delimiter) concatenates array elements into a single string with the specified delimiter. This is essential for displaying or exporting array data.

1SELECT
2 article_id,
3 ARRAY_JOIN(tags, ', ') AS tag_list
4FROM articles
Result
article_idtag_list
A001sql, data, tutorial
A002python, machine-learning
The tags array is converted to a comma-separated string. This is particularly useful for reports or CSV exports where array types are not supported.

MAP_FROM_ENTRIES

MAP_FROM_ENTRIES(array_of_pairs) constructs a map from an array of key-value pair structures. This enables dynamic map creation from computed values.

1SELECT
2 MAP_FROM_ENTRIES(
3 ARRAY[ROW('color', 'red'), ROW(
4 'size',
5 'large'
6 )]
7 ) AS attributes
Result
attributes
{color=red, size=large}
Each ROW represents a key-value pair. The resulting map can be queried using bracket notation like any other map.

Combining Operations

Real-world queries often chain multiple array and map operations:
1SELECT
2 user_id,
3 ARRAY_JOIN(
4 TRANSFORM(
5 FILTER(tags, t -> t != 'spam'),
6 t -> UPPER(t)
7 ),
8 ', '
9 ) AS clean_tags
10FROM users
Result
user_idclean_tags
U001SQL, PYTHON, DATA
U002ANALYTICS, VISUALIZATION

This chain: 1) FILTER removes spam tags, 2) TRANSFORM uppercases remaining tags, 3) ARRAY_JOIN formats the result as a string.

When chaining array functions, read from inside out. The innermost function executes first, and each outer function operates on the result of the previous one.

ARRAY_JOIN is particularly useful for report generation and CSV exports where array types are not supported by the consumer.

MAP_FROM_ENTRIES is the inverse of MAP_ENTRIES: one builds a map from key-value pairs, while the other deconstructs a map back into an array of pairs.

Chaining FILTER, TRANSFORM, and ARRAY_JOIN in a single expression keeps the data in array form until the final step, reducing intermediate row expansion.

Real-World Patterns

Daily Life
Interviews

Solve production problems with nested data

These patterns combine multiple array operations to solve common analytics problems you will encounter in production.

Sessionization with Arrays

Event streams often arrive as arrays that need to be filtered and transformed:
1WITH filtered_events AS(SELECT user_id, FILTER(events, e -> e.TIMESTAMP > CURRENT_DATE - INTERVAL '7' DAY) AS recent FROM user_activity)
2SELECT
3 user_id,
4 CARDINALITY(recent) AS event_count,
5 ELEMENT_AT(recent, 1).event_type AS first_event
6FROM filtered_events

This pattern filters events to the last 7 days, counts them, and extracts the first event type. The WITH clause makes the logic readable and reusable.

Recommendation Arrays

Recommendation systems often store results as arrays that need scoring and filtering:
1SELECT
2 user_id,
3 FILTER(
4 TRANSFORM(
5 recommendations,
6 r -> ROW(
7 r.product_id,
8 r.score * user_affinity
9 )
10 ),
11 r -> r.col2 > 0.7
12 ) AS high_confidence_recs
13FROM ml_predictions
This multiplies recommendation scores by user affinity, then filters to keep only high-confidence recommendations above 0.7.

Performance Considerations

Lambda functions have performance implications that differ from traditional SQL:
Lambda Performance Best Practices
  • Lambda evaluation happens per element, not per row
  • FILTER before TRANSFORM to reduce processing volume
  • Avoid nested lambdas when UNNEST + JOIN is clearer
  • Use CARDINALITY checks before accessing elements
  • Consider materialized views for frequently used transformations
Do
  • Use TRANSFORM for element-wise operations
  • Chain FILTER before expensive transformations
  • Combine array functions in WITH clauses for readability
  • Document lambda parameter names clearly
  • Use UNNEST when joining with other tables
Don't
  • Nest more than 2 levels of lambda functions
  • UNNEST arrays that already have millions of elements per row
  • Ignore NULL handling in lambda expressions
  • Use REDUCE when simple aggregation functions suffice
  • Chain transforms without intermediate CTEs for complex logic
Lambda functions and array operations are powerful tools for processing semi-structured data at scale. Understanding when to use each technique is key to building efficient analytics pipelines.

The FILTER-then-TRANSFORM pattern reduces processing volume by removing unwanted elements before applying expensive transformations.

When chaining array functions, read from inside out: the innermost function executes first, and each outer function operates on the previous result.

This in-array processing pattern avoids UNNEST row expansion while keeping the data in its original row structure.

PUTTING IT ALL TOGETHER

> You are a data engineer at Robinhood building a pipeline that processes arrays of trade events per user session to compute real-time risk metrics and feed a recommendation engine with personalized trade signals.

TRANSFORM applies a discount factor lambda to every trade amount in the session array, rescoring each event without expanding the array into separate rows.
FILTER removes cancelled and zero-value trades from the session array before REDUCE aggregates the remaining amounts into a single net position scalar.
UNNEST expands the trade event arrays into individual rows so they can be joined against the securities reference table to attach instrument metadata.
ARRAY_JOIN and MAP_FROM_ENTRIES format the final risk dimensions as a delimited label string and a structured attribute map for the downstream serving layer.
KEY TAKEAWAYS
TRANSFORM applies a lambda to every array element, returning a transformed array
FILTER keeps only elements where the lambda returns true
REDUCE aggregates array elements into a scalar using an accumulator pattern
UNNEST expands arrays into rows for traditional SQL operations
ARRAY_JOIN concatenates array elements into a delimited string
Lambda functions operate within array context without expanding rows
Use FILTER before TRANSFORM to reduce processing volume
Choose UNNEST for joins and complex aggregations, lambdas for in-array processing

TRANSFORM, FILTER, REDUCE: lambdas meet arrays

Category
SQL
Difficulty
advanced
Duration
27 minutes
Challenges
0 hands-on challenges

Topics covered: TRANSFORM for Array Mapping, FILTER and REDUCE, UNNEST for Array Expansion, Advanced Map Operations, Real-World Patterns

Lesson Sections

  1. TRANSFORM for Array Mapping (concepts: sqlArrayTransform)

    Lambda Syntax Lambda functions in SQL use arrow notation: element -> expression. The element name is arbitrary and scoped to the lambda body. String Transformations Lambda expressions can apply string functions to every element: Each tag is converted to uppercase. This pattern is useful for normalizing user-generated content before analysis. Complex Expressions Lambda expressions like this are the foundation of in-array transformations. The original array remains unchanged, and the result is a n

  2. FILTER and REDUCE

    Array manipulation goes beyond reading elements. You can filter, reduce, and aggregate array contents directly in SQL. FILTER for Conditional Rows FILTER with CARDINALITY A common pattern counts how many elements match a condition: This filters the reviews array to include only ratings of 4 or higher, then counts the result. The original reviews array is unchanged. REDUCE for Aggregation REDUCE has four components: the array, an initial accumulator value, a function that combines each element wi

  3. UNNEST for Array Expansion

    Basic UNNEST Filtering Unnested Data This filters the unnested items to include only those containing "premium". The result shows only matching items with their order IDs. Aggregating Unnested Arrays After unnesting, standard aggregation functions work on the expanded rows. This counts total item occurrences and unique items across all orders. UNNEST vs Lambda Functions After unnesting, you can group by the original row identifier to recollect elements, or apply aggregations across all elements

  4. Advanced Map Operations

    Maps and arrays become most useful when you can convert them to display-friendly formats and build them from query results. ARRAY_JOIN: String Concat The tags array is converted to a comma-separated string. This is particularly useful for reports or CSV exports where array types are not supported. MAP_FROM_ENTRIES Each ROW represents a key-value pair. The resulting map can be queried using bracket notation like any other map. Combining Operations Real-world queries often chain multiple array and

  5. Real-World Patterns

    These patterns combine multiple array operations to solve common analytics problems you will encounter in production. Sessionization with Arrays Event streams often arrive as arrays that need to be filtered and transformed: Recommendation Arrays Recommendation systems often store results as arrays that need scoring and filtering: This multiplies recommendation scores by user affinity, then filters to keep only high-confidence recommendations above 0.7. Performance Considerations Lambda functions