Complex Data: Advanced

DoorDash ingests millions of raw JSON events from their mobile apps every minute, and their data team uses Snowflake's VARIANT type to land those payloads directly into the warehouse without upfront schema definition. From there, engineers query deeply nested fields like restaurant geofence coordinates buried inside delivery event objects using the same JSON path expressions as structured columns, with no performance penalty for the nesting. When a new event field appears in the mobile SDK, it becomes queryable immediately without a migration. The document-relational techniques in this lesson are the foundation of that kind of flexible, high-performance data architecture.

TRANSFORM for Array Mapping

Daily Life

Interviews

Apply functions to every array element

TRANSFORM(array, lambda) applies a function to every element in an array, returning a new array with the transformed values. This is conceptually similar to map() in functional programming.

Lambda Syntax

Lambda functions in SQL use arrow notation: element -> expression. The element name is arbitrary and scoped to the lambda body.

	SELECT
	TRANSFORM(
	ARRAY[1, 2, 3, 4, 5],
	x -> x * 2
	) AS doubled

Result

doubled
[2, 4, 6, 8, 10]

The lambda x -> x * 2 takes each element x and returns x multiplied by 2. The result is a new array with each original element doubled.

String Transformations

Lambda expressions can apply string functions to every element:

	SELECT
	TRANSFORM(
	tags,
	tag -> UPPER(tag)
	) AS uppercase_tags
	FROM articles

Result

uppercase_tags
[SQL, DATA, TUTORIAL]
[PYTHON, MACHINE-LEARNING]

Each tag is converted to uppercase. This pattern is useful for normalizing user-generated content before analysis.

Apply mathematical operations like percentages, rounding, or unit conversions.

	SELECT
	TRANSFORM(prices, x -> x * 0.9) AS discounted
	FROM products

Complex Expressions

Lambda bodies can contain complex expressions including CASE statements, nested functions, and conditionals:

	SELECT
	TRANSFORM(
	prices,
	p -> CASE
	WHEN p > 100 THEN p * 0.9
	ELSE p
	END
	) AS discounted_prices
	FROM products

Result

discounted_prices
[45.0, 90.0, 67.5]
[120.0, 150.0]

This applies a 10% discount to prices over 100. The CASE expression evaluates for each array element independently.

TIP

TRANSFORM does not modify the original array. It creates a new array with the transformed values. Use it when you need to apply the same operation to every element without changing the array length.

Lambda expressions like this are the foundation of in-array transformations. The original array remains unchanged, and the result is a new array ready for downstream use.

You can chain TRANSFORM with other array functions. For example, applying TRANSFORM first to normalize values, then ARRAY_JOIN to produce a display string.

Remember that every element in the result array corresponds to exactly one element in the input array. TRANSFORM never changes the length of the array.

FILTER and REDUCE

Daily Life

Interviews

Select and combine array values with logic

Array manipulation goes beyond reading elements. You can filter, reduce, and aggregate array contents directly in SQL.

FILTER for Conditional Rows

FILTER(array, lambda) returns a new array containing only elements for which the lambda returns true. This is equivalent to a WHERE clause but for array elements.

	SELECT
	FILTER(prices, p -> p > 50) AS expensive_prices
	FROM products

Result

expensive_prices
[75, 120, 99]
[60]

The lambda p -> p > 50 evaluates to true for prices greater than 50. Only those elements are included in the result array.

FILTER with CARDINALITY

A common pattern counts how many elements match a condition:

	SELECT
	product_id,
	CARDINALITY(
	FILTER(reviews, r -> r >= 4)
	) AS positive_reviews
	FROM products

This filters the reviews array to include only ratings of 4 or higher, then counts the result. The original reviews array is unchanged.

Filter by numeric conditions like positive values, ranges, or thresholds.

	SELECT
	FILTER(amounts, a -> a > 0) AS positive
	FROM transactions

REDUCE for Aggregation

REDUCE(array, initial, input_function, output_function) aggregates array elements into a single value using an accumulator pattern.

REDUCE has four components: the array, an initial accumulator value, a function that combines each element with the accumulator, and a final output function.

	SELECT
	REDUCE(
	ARRAY[1, 2, 3, 4, 5],
	0,
	(s, x) -> s + x,
	s -> s
	) AS total

Starting with 0, the lambda (s, x) -> s + x adds each element to the running sum. The output function s -> s returns the final accumulator unchanged.

Calculating Weighted Avg

REDUCE enables complex aggregations like weighted averages:

	SELECT
	REDUCE(
	scores,
	CAST(
	ROW(0, 0)
	AS ROW(sum DOUBLE, count INTEGER)
	),
	(s, x) -> CAST(
	ROW(s.sum + x, s.count + 1)
	AS ROW(sum DOUBLE, count INTEGER)
	),
	s -> s.sum / s.count
	) AS average

The accumulator is a struct with sum and count fields. Each iteration updates both fields. The output function divides sum by count to produce the average.

Classic reduction: accumulate by adding each element to the running total.

	SELECT
	REDUCE(
	nums,
	0,
	(s, x) -> s + x,
	s -> s
	) AS total
	FROM data

TIP

Use FILTER when you need a subset of array elements. Use REDUCE when you need to aggregate array values into a single scalar result.

FILTER is often the first step in a processing pipeline. By removing unwanted elements early, subsequent TRANSFORM or REDUCE operations work on a smaller, cleaner dataset.

Unlike WHERE clauses that remove entire rows, FILTER operates within a single cell. The row stays in the result; only the array contents are reduced.

Combine FILTER with CARDINALITY to count matching elements without materializing the filtered array in a separate column.

UNNEST for Array Expansion

Daily Life

Interviews

Turn one row with an array into many rows

UNNEST expands an array into individual rows, enabling you to apply standard SQL operations to array elements. This is the opposite of array aggregation.

Basic UNNEST

	SELECT
	order_id,
	item
	FROM orders
	CROSS JOIN UNNEST(items) AS t(item)

The CROSS JOIN UNNEST pattern creates one row per array element. The original order_id repeats for each item, expanding 3 input rows into 6 output rows.

Filtering Unnested Data

Once unnested, you can apply WHERE clauses to filter individual elements:

	SELECT
	order_id,
	item
	FROM orders
	CROSS JOIN UNNEST(items) AS t(item)
	WHERE item LIKE '%premium%'

Result

order_id	item
1003	premium widget
1005	premium sensor

This filters the unnested items to include only those containing "premium". The result shows only matching items with their order IDs.

Aggregating Unnested Arrays

UNNEST enables aggregations across array elements from multiple rows:

	SELECT
	COUNT(*) AS total_items,
	COUNT(DISTINCT item) AS unique_items
	FROM orders
	CROSS JOIN UNNEST(items) AS t(item)

Result

total_items	unique_items
147	23

After unnesting, standard aggregation functions work on the expanded rows. This counts total item occurrences and unique items across all orders.

UNNEST vs Lambda Functions

Choosing between UNNEST and lambda functions depends on your use case:

Operation	UNNEST	Lambda Functions
Row count	Expands rows	Preserves rows
Joins	Easy to join	Requires subqueries
Aggregation	Standard SQL aggs	Need REDUCE
Performance	Memory intensive	More efficient
Readability	Familiar SQL	Functional style

TIP

Use UNNEST when you need to join array elements with other tables or apply complex aggregations. Use lambda functions when you want to keep data in array form for downstream processing.

UNNEST multiplies rows: an array with five elements produces five output rows. Keep this expansion in mind when estimating the size of the result set.

After unnesting, you can group by the original row identifier to recollect elements, or apply aggregations across all elements from all rows together.

UNNEST is especially useful when you need to join array elements against another table, something that lambda functions cannot do without subqueries.

The Feature PipelineStep 1

An e-commerce company stores user clickstream data as arrays in their data warehouse. Each of 10 million user profiles contains arrays of clicked product IDs (avg 2,000 elements), session durations, and search queries. The ML team needs to extract 50+ features from these arrays for their recommendation model. Features include "count of premium product clicks," "average session duration for sessions over 30 seconds," and "most frequent search category." Each feature requires some combination of FILTER, TRANSFORM, and REDUCE.

user_behavior

user_id	clicked_products	view_durations	search_terms
u_42	[p_101, p_205, p_101, ...]	[12, 45, 8, 120, ...]	[shoes, sale shoes, ...]
u_17	[p_88, p_12, ...]	[90, 35, ...]	[laptop bag, ...]

Jul 2026

Strategy

The pipeline runs nightly and must produce 50+ numeric features per user. Each feature requires filtering, transforming, or reducing array elements. How do you structure the feature engineering SQL?

SQL

-- 10M users x 2,000 avg elements
-- 50+ features needed per user
-- Nightly batch, 4-hour SLA

Real pipelines require balancing lambda-based in-array processing against UNNEST-based row expansion. The right choice depends on downstream usage and data volume.

When arrays are small and transformations simple, lambdas keep the row structure intact and avoid the row explosion associated with UNNEST.

For complex aggregations or joins against external tables, UNNEST followed by GROUP BY is often the cleaner and more maintainable approach.

Advanced Map Operations

Daily Life

Interviews

Reshape complex nested data structures

Maps and arrays become most useful when you can convert them to display-friendly formats and build them from query results.

ARRAY_JOIN: String Concat

ARRAY_JOIN(array, delimiter) concatenates array elements into a single string with the specified delimiter. This is essential for displaying or exporting array data.

	SELECT
	article_id,
	ARRAY_JOIN(tags, ', ') AS tag_list
	FROM articles

Result

article_id	tag_list
A001	sql, data, tutorial
A002	python, machine-learning

The tags array is converted to a comma-separated string. This is particularly useful for reports or CSV exports where array types are not supported.

MAP_FROM_ENTRIES

MAP_FROM_ENTRIES(array_of_pairs) constructs a map from an array of key-value pair structures. This enables dynamic map creation from computed values.

	SELECT
	MAP_FROM_ENTRIES(
	ARRAY[ROW('color', 'red'), ROW(
	'size',
	'large'
	)]
	) AS attributes

Result

attributes
{color=red, size=large}

Each ROW represents a key-value pair. The resulting map can be queried using bracket notation like any other map.

Combining Operations

Real-world queries often chain multiple array and map operations:

	SELECT
	user_id,
	ARRAY_JOIN(
	TRANSFORM(
	FILTER(tags, t -> t != 'spam'),
	t -> UPPER(t)
	),
	', '
	) AS clean_tags
	FROM users

Result

user_id	clean_tags
U001	SQL, PYTHON, DATA
U002	ANALYTICS, VISUALIZATION

This chain: 1) FILTER removes spam tags, 2) TRANSFORM uppercases remaining tags, 3) ARRAY_JOIN formats the result as a string.

When chaining array functions, read from inside out. The innermost function executes first, and each outer function operates on the result of the previous one.

ARRAY_JOIN is particularly useful for report generation and CSV exports where array types are not supported by the consumer.

MAP_FROM_ENTRIES is the inverse of MAP_ENTRIES: one builds a map from key-value pairs, while the other deconstructs a map back into an array of pairs.

Chaining FILTER, TRANSFORM, and ARRAY_JOIN in a single expression keeps the data in array form until the final step, reducing intermediate row expansion.

Real-World Patterns

Daily Life

Interviews

Solve production problems with nested data

These patterns combine multiple array operations to solve common analytics problems you will encounter in production.

Sessionization with Arrays

Event streams often arrive as arrays that need to be filtered and transformed:

	WITH filtered_events AS(SELECT user_id, FILTER(events, e -> e.TIMESTAMP > CURRENT_DATE - INTERVAL '7' DAY) AS recent FROM user_activity)
	SELECT
	user_id,
	CARDINALITY(recent) AS event_count,
	ELEMENT_AT(recent, 1).event_type AS first_event
	FROM filtered_events

This pattern filters events to the last 7 days, counts them, and extracts the first event type. The WITH clause makes the logic readable and reusable.

Recommendation Arrays

Recommendation systems often store results as arrays that need scoring and filtering:

	SELECT
	user_id,
	FILTER(
	TRANSFORM(
	recommendations,
	r -> ROW(
	r.product_id,
	r.score * user_affinity
	)
	),
	r -> r.col2 > 0.7
	) AS high_confidence_recs
	FROM ml_predictions

This multiplies recommendation scores by user affinity, then filters to keep only high-confidence recommendations above 0.7.

Performance Considerations

Lambda functions have performance implications that differ from traditional SQL:

Lambda Performance Best Practices

Lambda evaluation happens per element, not per row
FILTER before TRANSFORM to reduce processing volume
Avoid nested lambdas when UNNEST + JOIN is clearer
Use CARDINALITY checks before accessing elements
Consider materialized views for frequently used transformations

✓Do

Use TRANSFORM for element-wise operations
Chain FILTER before expensive transformations
Combine array functions in WITH clauses for readability
Document lambda parameter names clearly
Use UNNEST when joining with other tables

✗Don't

Nest more than 2 levels of lambda functions
UNNEST arrays that already have millions of elements per row
Ignore NULL handling in lambda expressions
Use REDUCE when simple aggregation functions suffice
Chain transforms without intermediate CTEs for complex logic

Lambda functions and array operations are powerful tools for processing semi-structured data at scale. Understanding when to use each technique is key to building efficient analytics pipelines.

The FILTER-then-TRANSFORM pattern reduces processing volume by removing unwanted elements before applying expensive transformations.

When chaining array functions, read from inside out: the innermost function executes first, and each outer function operates on the previous result.

This in-array processing pattern avoids UNNEST row expansion while keeping the data in its original row structure.

❯❯❯PUTTING IT ALL TOGETHER

> You are a data engineer at Robinhood building a pipeline that processes arrays of trade events per user session to compute real-time risk metrics and feed a recommendation engine with personalized trade signals.

TRANSFORM applies a discount factor lambda to every trade amount in the session array, rescoring each event without expanding the array into separate rows.

FILTER removes cancelled and zero-value trades from the session array before REDUCE aggregates the remaining amounts into a single net position scalar.

UNNEST expands the trade event arrays into individual rows so they can be joined against the securities reference table to attach instrument metadata.

ARRAY_JOIN and MAP_FROM_ENTRIES format the final risk dimensions as a delimited label string and a structured attribute map for the downstream serving layer.

KEY TAKEAWAYS

TRANSFORM applies a lambda to every array element, returning a transformed array

FILTER keeps only elements where the lambda returns true

REDUCE aggregates array elements into a scalar using an accumulator pattern

UNNEST expands arrays into rows for traditional SQL operations

ARRAY_JOIN concatenates array elements into a delimited string

Lambda functions operate within array context without expanding rows

Use FILTER before TRANSFORM to reduce processing volume

Choose UNNEST for joins and complex aggregations, lambdas for in-array processing

TRANSFORM, FILTER, REDUCE: lambdas meet arrays

Category: SQL
Difficulty: advanced
Duration: 27 minutes
Challenges: 0 hands-on challenges

Topics covered: TRANSFORM for Array Mapping, FILTER and REDUCE, UNNEST for Array Expansion, Advanced Map Operations, Real-World Patterns

Lesson Sections

TRANSFORM for Array Mapping (concepts: sqlArrayTransform)
Lambda Syntax Lambda functions in SQL use arrow notation: element -> expression. The element name is arbitrary and scoped to the lambda body. String Transformations Lambda expressions can apply string functions to every element: Each tag is converted to uppercase. This pattern is useful for normalizing user-generated content before analysis. Complex Expressions Lambda expressions like this are the foundation of in-array transformations. The original array remains unchanged, and the result is a n
FILTER and REDUCE (concepts: sqlArrayOps)
Array manipulation goes beyond reading elements. You can filter, reduce, and aggregate array contents directly in SQL. FILTER for Conditional Rows FILTER with CARDINALITY A common pattern counts how many elements match a condition: This filters the reviews array to include only ratings of 4 or higher, then counts the result. The original reviews array is unchanged. REDUCE for Aggregation REDUCE has four components: the array, an initial accumulator value, a function that combines each element wi
UNNEST for Array Expansion (concepts: sqlUnnest)
Basic UNNEST Filtering Unnested Data This filters the unnested items to include only those containing "premium". The result shows only matching items with their order IDs. Aggregating Unnested Arrays After unnesting, standard aggregation functions work on the expanded rows. This counts total item occurrences and unique items across all orders. UNNEST vs Lambda Functions After unnesting, you can group by the original row identifier to recollect elements, or apply aggregations across all elements
Advanced Map Operations (concepts: sqlMapBasic)
Maps and arrays become most useful when you can convert them to display-friendly formats and build them from query results. ARRAY_JOIN: String Concat The tags array is converted to a comma-separated string. This is particularly useful for reports or CSV exports where array types are not supported. MAP_FROM_ENTRIES Each ROW represents a key-value pair. The resulting map can be queried using bracket notation like any other map. Combining Operations Real-world queries often chain multiple array and
Real-World Patterns (concepts: sqlArrayOps)
These patterns combine multiple array operations to solve common analytics problems you will encounter in production. Sessionization with Arrays Event streams often arrive as arrays that need to be filtered and transformed: Recommendation Arrays Recommendation systems often store results as arrays that need scoring and filtering: This multiplies recommendation scores by user affinity, then filters to keep only high-confidence recommendations above 0.7. Performance Considerations Lambda functions