Complex Data: Advanced
DoorDash ingests millions of raw JSON events from their mobile apps every minute, and their data team uses Snowflake's VARIANT type to land those payloads directly into the warehouse without upfront schema definition. From there, engineers query deeply nested fields like restaurant geofence coordinates buried inside delivery event objects using the same JSON path expressions as structured columns, with no performance penalty for the nesting. When a new event field appears in the mobile SDK, it becomes queryable immediately without a migration. The document-relational techniques in this lesson are the foundation of that kind of flexible, high-performance data architecture.
TRANSFORM for Array Mapping
Apply functions to every array element
TRANSFORM(array, lambda) applies a function to every element in an array, returning a new array with the transformed values. This is conceptually similar to map() in functional programming.
Lambda Syntax
The lambda x -> x * 2 takes each element x and returns x multiplied by 2. The result is a new array with each original element doubled.
String Transformations
Complex Expressions
Lambda bodies can contain complex expressions including CASE statements, nested functions, and conditionals:
This applies a 10% discount to prices over 100. The CASE expression evaluates for each array element independently.
TRANSFORM does not modify the original array. It creates a new array with the transformed values. Use it when you need to apply the same operation to every element without changing the array length.You can chain TRANSFORM with other array functions. For example, applying TRANSFORM first to normalize values, then ARRAY_JOIN to produce a display string.
FILTER and REDUCE
Select and combine array values with logic
FILTER for Conditional Rows
FILTER(array, lambda) returns a new array containing only elements for which the lambda returns true. This is equivalent to a WHERE clause but for array elements.
The lambda p -> p > 50 evaluates to true for prices greater than 50. Only those elements are included in the result array.
FILTER with CARDINALITY
REDUCE for Aggregation
REDUCE(array, initial, input_function, output_function) aggregates array elements into a single value using an accumulator pattern.
Starting with 0, the lambda (s, x) -> s + x adds each element to the running sum. The output function s -> s returns the final accumulator unchanged.
Calculating Weighted Avg
FILTER when you need a subset of array elements. Use REDUCE when you need to aggregate array values into a single scalar result.FILTER is often the first step in a processing pipeline. By removing unwanted elements early, subsequent TRANSFORM or REDUCE operations work on a smaller, cleaner dataset.
Unlike WHERE clauses that remove entire rows, FILTER operates within a single cell. The row stays in the result; only the array contents are reduced.
Combine FILTER with CARDINALITY to count matching elements without materializing the filtered array in a separate column.
UNNEST for Array Expansion
Turn one row with an array into many rows
UNNEST expands an array into individual rows, enabling you to apply standard SQL operations to array elements. This is the opposite of array aggregation.
Basic UNNEST
The CROSS JOIN UNNEST pattern creates one row per array element. The original order_id repeats for each item, expanding 3 input rows into 6 output rows.
Filtering Unnested Data
Once unnested, you can apply WHERE clauses to filter individual elements:
Aggregating Unnested Arrays
UNNEST enables aggregations across array elements from multiple rows:
UNNEST vs Lambda Functions
Choosing between UNNEST and lambda functions depends on your use case:
| Operation | UNNEST | Lambda Functions |
|---|---|---|
| Row count | Expands rows | Preserves rows |
| Joins | Easy to join | Requires subqueries |
| Aggregation | Standard SQL aggs | Need REDUCE |
| Performance | Memory intensive | More efficient |
| Readability | Familiar SQL | Functional style |
UNNEST when you need to join array elements with other tables or apply complex aggregations. Use lambda functions when you want to keep data in array form for downstream processing.UNNEST multiplies rows: an array with five elements produces five output rows. Keep this expansion in mind when estimating the size of the result set.
UNNEST is especially useful when you need to join array elements against another table, something that lambda functions cannot do without subqueries.
An e-commerce company stores user clickstream data as arrays in their data warehouse. Each of 10 million user profiles contains arrays of clicked product IDs (avg 2,000 elements), session durations, and search queries. The ML team needs to extract 50+ features from these arrays for their recommendation model. Features include "count of premium product clicks," "average session duration for sessions over 30 seconds," and "most frequent search category." Each feature requires some combination of FILTER, TRANSFORM, and REDUCE.
| user_id | clicked_products | view_durations | search_terms |
|---|---|---|---|
| u_42 | [p_101, p_205, p_101, ...] | [12, 45, 8, 120, ...] | [shoes, sale shoes, ...] |
| u_17 | [p_88, p_12, ...] | [90, 35, ...] | [laptop bag, ...] |
The pipeline runs nightly and must produce 50+ numeric features per user. Each feature requires filtering, transforming, or reducing array elements. How do you structure the feature engineering SQL?
-- 10M users x 2,000 avg elements -- 50+ features needed per user -- Nightly batch, 4-hour SLA
Real pipelines require balancing lambda-based in-array processing against UNNEST-based row expansion. The right choice depends on downstream usage and data volume.
When arrays are small and transformations simple, lambdas keep the row structure intact and avoid the row explosion associated with UNNEST.
For complex aggregations or joins against external tables, UNNEST followed by GROUP BY is often the cleaner and more maintainable approach.
Advanced Map Operations
Reshape complex nested data structures
ARRAY_JOIN: String Concat
ARRAY_JOIN(array, delimiter) concatenates array elements into a single string with the specified delimiter. This is essential for displaying or exporting array data.
MAP_FROM_ENTRIES
MAP_FROM_ENTRIES(array_of_pairs) constructs a map from an array of key-value pair structures. This enables dynamic map creation from computed values.
Combining Operations
This chain: 1) FILTER removes spam tags, 2) TRANSFORM uppercases remaining tags, 3) ARRAY_JOIN formats the result as a string.
ARRAY_JOIN is particularly useful for report generation and CSV exports where array types are not supported by the consumer.
MAP_FROM_ENTRIES is the inverse of MAP_ENTRIES: one builds a map from key-value pairs, while the other deconstructs a map back into an array of pairs.
Chaining FILTER, TRANSFORM, and ARRAY_JOIN in a single expression keeps the data in array form until the final step, reducing intermediate row expansion.
Real-World Patterns
Solve production problems with nested data
Sessionization with Arrays
This pattern filters events to the last 7 days, counts them, and extracts the first event type. The WITH clause makes the logic readable and reusable.
Recommendation Arrays
Performance Considerations
- Lambda evaluation happens per element, not per row
FILTERbeforeTRANSFORMto reduce processing volume- Avoid nested lambdas when
UNNEST+JOINis clearer - Use
CARDINALITYchecks before accessing elements - Consider materialized views for frequently used transformations
- Use TRANSFORM for element-wise operations
- Chain FILTER before expensive transformations
- Combine array functions in WITH clauses for readability
- Document lambda parameter names clearly
- Use UNNEST when joining with other tables
- Nest more than 2 levels of lambda functions
- UNNEST arrays that already have millions of elements per row
- Ignore NULL handling in lambda expressions
- Use REDUCE when simple aggregation functions suffice
- Chain transforms without intermediate CTEs for complex logic
The FILTER-then-TRANSFORM pattern reduces processing volume by removing unwanted elements before applying expensive transformations.
This in-array processing pattern avoids UNNEST row expansion while keeping the data in its original row structure.
> You are a data engineer at Robinhood building a pipeline that processes arrays of trade events per user session to compute real-time risk metrics and feed a recommendation engine with personalized trade signals.
TRANSFORM applies a discount factor lambda to every trade amount in the session array, rescoring each event without expanding the array into separate rows.FILTER removes cancelled and zero-value trades from the session array before REDUCE aggregates the remaining amounts into a single net position scalar.UNNEST expands the trade event arrays into individual rows so they can be joined against the securities reference table to attach instrument metadata.ARRAY_JOIN and MAP_FROM_ENTRIES format the final risk dimensions as a delimited label string and a structured attribute map for the downstream serving layer.TRANSFORM applies a lambda to every array element, returning a transformed arrayFILTER keeps only elements where the lambda returns trueREDUCE aggregates array elements into a scalar using an accumulator patternUNNEST expands arrays into rows for traditional SQL operationsARRAY_JOIN concatenates array elements into a delimited stringTRANSFORM, FILTER, REDUCE: lambdas meet arrays
- Category
- SQL
- Difficulty
- advanced
- Duration
- 27 minutes
- Challenges
- 0 hands-on challenges
Topics covered: TRANSFORM for Array Mapping, FILTER and REDUCE, UNNEST for Array Expansion, Advanced Map Operations, Real-World Patterns
Lesson Sections
- TRANSFORM for Array Mapping (concepts: sqlArrayTransform)
Lambda Syntax Lambda functions in SQL use arrow notation: element -> expression. The element name is arbitrary and scoped to the lambda body. String Transformations Lambda expressions can apply string functions to every element: Each tag is converted to uppercase. This pattern is useful for normalizing user-generated content before analysis. Complex Expressions Lambda expressions like this are the foundation of in-array transformations. The original array remains unchanged, and the result is a n
- FILTER and REDUCE
Array manipulation goes beyond reading elements. You can filter, reduce, and aggregate array contents directly in SQL. FILTER for Conditional Rows FILTER with CARDINALITY A common pattern counts how many elements match a condition: This filters the reviews array to include only ratings of 4 or higher, then counts the result. The original reviews array is unchanged. REDUCE for Aggregation REDUCE has four components: the array, an initial accumulator value, a function that combines each element wi
- UNNEST for Array Expansion
Basic UNNEST Filtering Unnested Data This filters the unnested items to include only those containing "premium". The result shows only matching items with their order IDs. Aggregating Unnested Arrays After unnesting, standard aggregation functions work on the expanded rows. This counts total item occurrences and unique items across all orders. UNNEST vs Lambda Functions After unnesting, you can group by the original row identifier to recollect elements, or apply aggregations across all elements
- Advanced Map Operations
Maps and arrays become most useful when you can convert them to display-friendly formats and build them from query results. ARRAY_JOIN: String Concat The tags array is converted to a comma-separated string. This is particularly useful for reports or CSV exports where array types are not supported. MAP_FROM_ENTRIES Each ROW represents a key-value pair. The resulting map can be queried using bracket notation like any other map. Combining Operations Real-world queries often chain multiple array and
- Real-World Patterns
These patterns combine multiple array operations to solve common analytics problems you will encounter in production. Sessionization with Arrays Event streams often arrive as arrays that need to be filtered and transformed: Recommendation Arrays Recommendation systems often store results as arrays that need scoring and filtering: This multiplies recommendation scores by user affinity, then filters to keep only high-confidence recommendations above 0.7. Performance Considerations Lambda functions