Complex Data: Intermediate
Now that you understand how to access values in arrays, maps, and JSON, it is time to learn the functions that make these structures truly powerful. This lesson covers operations for measuring, searching, sorting, and transforming complex data types.
Data engineers frequently need to answer questions like: How many items are in this array? Does this map contain a specific key? What are all the unique values? These operations are fundamental to building analytics pipelines that process semi-structured data at scale.
Array Operations
Search, sort, and modify array contents
Inspection Functions
CARDINALITY(array) returns the number of elements in an array. This is essential for understanding data distribution and filtering based on collection size.
| order_id | items |
|---|---|
| 1001 | [widget, gadget, tool] |
| 1002 | [widget] |
| 1003 | [gadget, tool, device, sensor] |
Use CARDINALITY to identify orders with many items, filter out empty arrays, or calculate averages across variable-length collections.
CONTAINS()
CONTAINS(array, element) returns true if the specified element exists in the array. This enables filtering rows based on array contents without expanding the array.
The CONTAINS function performs an efficient membership test. For large arrays or frequent lookups, consider whether the data model should use a separate table with proper indexing instead.
Element Access
ELEMENT_AT(array, index) returns the element at the specified position. Unlike bracket notation, ELEMENT_AT handles out-of-bounds indices gracefully by returning NULL.
Negative indices count from the end: ELEMENT_AT(array, -1) returns the last element. This is useful when you need the most recent item in a chronologically ordered array.
Performance Characteristics
Database Execution
> Complete this query to count the number of tags for each event.
SELECT event_id, () AS tag_count FROM event_data
With CARDINALITY you can quickly measure any array. This pairs naturally with WHERE clauses to find rows where array size meets specific criteria.
Remember: CARDINALITY returns 0 for empty arrays and NULL for NULL arrays, so handle both cases in production queries.
Array Cleanup
Remove duplicates and nulls from arrays
Deduplication
ARRAY_DISTINCT(array) removes duplicate elements, returning an array with only unique values. Order is not guaranteed in the result.
| user_id | page_views |
|---|---|
| u001 | [home, products, home, cart, home] |
| u002 | [login, home, products, products, checkout] |
Combining ARRAY_DISTINCT with CARDINALITY gives you the count of unique values, useful for metrics like unique page views per session or distinct products viewed.
Sorting & Set Operations
ARRAY_SORT(array) returns the array with elements in ascending order. For strings, this is lexicographic order. For numbers, it is numeric order.
ARRAY_SORT(ARRAY_DISTINCT(tags)) first removes duplicates, then sorts the unique values. This is a common pattern for normalizing user-generated tags or categories.ARRAY_UNION/INTERSECT
ARRAY_UNION merges two arrays and removes duplicates. ARRAY_INTERSECT returns only elements present in both arrays. These are the array equivalents of SQL UNION and INTERSECT operations.
> Complete this query to retrieve the second tag from each event by position.
SELECT event_id, (, ) AS second_tag FROM event_data
ELEMENT_AT provides safe positional access. Unlike bracket notation in some systems, out-of-bounds indices return NULL rather than errors.
Combine ELEMENT_AT with COALESCE to provide default values when an index might not exist.
Map Operations
Merge, filter, and transform map data
Key & Value Extraction
MAP_KEYS(map) returns an array containing all keys in the map. This is essential for discovering what properties exist in a flexible schema.
| product_id | attributes |
|---|---|
| p001 | {color=red, size=large, material=cotton} |
| p002 | {color=blue, weight=2kg} |
| p003 | {size=medium, origin=usa, organic=true} |
MAP_VALUES()
MAP_VALUES(map) returns an array of all values in the map, in the same order as their corresponding keys from MAP_KEYS.
Combining MAP_KEYS and MAP_VALUES with array functions like CONTAINS lets you search maps for specific keys or values without knowing the complete schema.
Entry Processing
MAP_ENTRIES(map) returns an array of key-value pair structures, enabling iteration over the entire map. Each entry contains a key and value field.
Performance & Best Practices
Database Execution
- map_keys and map_values iterate over map entries
- Key lookup with bracket notation is O(1) hash access
- Maps with many entries consume more memory than arrays
- Missing keys return NULL, not errors
> Complete this query to find events that contain a specific tag value.
SELECT event_id FROM event_data WHERE (, )
The CONTAINS function works on both arrays and the arrays returned by MAP_KEYS, making it versatile for membership checks across data structures.
Pair CONTAINS with MAP_KEYS to check for property existence, or with raw arrays for value membership tests.
Next you will see how JSON_EXTRACT_SCALAR simplifies filtering and joining on values embedded inside JSON documents.
JSON_EXTRACT_SCALAR()
Get clean scalar values from JSON fields
While JSON_EXTRACT returns JSON-typed values, JSON_EXTRACT_SCALAR returns plain SQL strings. This makes it perfect for filtering, joining, and comparing JSON values without type conversion.
Understanding Scalar Extraction
Recall that JSON_EXTRACT preserves JSON formatting, including quotes around strings:
- Returns: "alice" (with quotes)
- Type: JSON value
- Comparison: Complex
- Filtering: Needs escaping
- Returns: alice (plain)
- Type: VARCHAR
- Comparison: Simple
- Filtering: Direct equals
Basic Usage
This example shows how to extract scalar values from JSON and filter rows based on those values.
| event_id | payload |
|---|---|
| 1 | {"user": "alice", "action": "login"} |
| 2 | {"user": "bob", "action": "purchase"} |
| 3 | {"user": "alice", "action": "logout"} |
The WHERE clause uses a simple string comparison without escaped quotes. This is much cleaner than working with JSON_EXTRACT.
Practical Applications
JSON_EXTRACT_SCALAR enables efficient filtering on JSON fields. Some query engines can even push these filters down to storage for better performance.
Type Considerations
JSON_EXTRACT_SCALAR only works with scalar JSON values: strings, numbers, and booleans. Attempting to extract an array or object returns NULL.
Numbers are returned as strings. If you need numeric operations, cast the result: CAST(json_extract_scalar(...) AS DOUBLE).
JSON_EXTRACT_SCALAR for scalar values you will filter, join, or compare. Use JSON_EXTRACT when you need to preserve JSON structure or extract complex types like arrays and objects.> Complete this query to list all the property names stored in a map column.
SELECT event_id, () AS prop_keys FROM event_data
MAP_KEYS reveals which properties each row contains, making it essential for auditing schema consistency across flexible data.
Use MAP_KEYS with CARDINALITY to count properties, or with CONTAINS to check for specific keys.
The final section covers CAST, which bridges JSON values to native SQL types so you can use all the functions learned in this lesson.
CAST JSON to Array/Map
Convert raw JSON into queryable types
Converting JSON to native array or map types unlocks the full power of array and map functions. Use CAST to transform extracted JSON into structured SQL types.
Array Conversion
Converting JSON arrays to native SQL arrays unlocks the full set of array functions like CARDINALITY and CONTAINS.
JSON Array to SQL Array
Use CAST with ARRAY(type) to convert a JSON array into a native SQL array.
After casting, you can use CARDINALITY, CONTAINS, ELEMENT_AT, and all other array functions on the result.
Map Conversion
Converting JSON objects to native SQL maps enables efficient key-based lookups and map functions.
JSON Object to SQL Map
Use CAST with MAP(key_type, value_type) to convert a JSON object into a native SQL map.
The target type MAP(VARCHAR, VARCHAR) specifies both key and value types. All keys must convert to the key type, and all values must convert to the value type.
Best Practices
Type Specifications
This reference shows common JSON structures and their corresponding SQL type specifications.
| JSON Structure | Target Type |
|---|---|
| ["a", "b", "c"] | ARRAY(VARCHAR) |
| [1, 2, 3] | ARRAY(INTEGER) |
| [1.5, 2.5] | ARRAY(DOUBLE) |
| {"k": "v"} | MAP(VARCHAR, VARCHAR) |
| {"k": 123} | MAP(VARCHAR, INTEGER) |
WITH clause to cast once and reference the result by name. This reduces parsing overhead and improves readability.- Use json_extract_scalar for filtering and joins
- Cast to native types when using array/map functions
- Use WITH clauses to avoid repeated parsing
- Check cardinality before accessing specific indices
- Combine array_distinct with cardinality for unique counts
- Compare JSON_EXTRACT results without handling quotes
- Cast arrays without knowing the element type
- Assume all rows have identical JSON structure
- Use json_extract_scalar on arrays or objects
- Ignore NULL handling for missing keys or paths
> Complete this query to cast a JSON tags field to a native SQL array and count its elements.
SELECT event_id, ( ( JSON_EXTRACT(payload, '$.tags') AS ) ) AS tag_count FROM event_data
Casting JSON to native types is the bridge between semi-structured and structured data. Once cast, all array and map functions become available.
The WITH clause pattern of casting once and referencing many times is the recommended approach for production queries involving JSON conversion.
You now have a complete toolkit for working with arrays, maps, and JSON in SQL. These functions are fundamental to processing semi-structured data at scale.
> You are building a user behavior report from a sessions table. Each row has a tags array of event categories, a properties map of session attributes, and a JSON payload column. You need to identify users who visited more than three unique pages, check whether they have a premium flag in their properties map, and filter sessions where the JSON action field equals "purchase".
The challenge combines all five techniques across a single events table with three semi-structured columns.
CARDINALITY(array) returns the number of elementsCONTAINS(array, element) checks for membershipELEMENT_AT(array, index) safely accesses elements with NULL for out-of-boundsARRAY_DISTINCT removes duplicates; ARRAY_SORT orders elementsMAP_KEYS and MAP_VALUES extract arrays of keys and values from mapsJSON_EXTRACT_SCALAR returns plain strings for filtering and joiningCAST JSON to ARRAY or MAP to use native array/map functionsWITH clauses to avoid parsing the same JSON multiple timesArrays, maps, and finding needles
- Category
- SQL
- Difficulty
- intermediate
- Duration
- 26 minutes
- Challenges
- 0 hands-on challenges
Topics covered: Array Operations, Array Cleanup, Map Operations, JSON_EXTRACT_SCALAR(), CAST JSON to Array/Map
Lesson Sections
- Array Operations (concepts: sqlArrayOps)
Arrays in SQL come with a rich set of functions for inspection, access, and manipulation. These functions treat arrays as first-class data structures, enabling operations that would otherwise require complex joins or procedural code. Inspection Functions CONTAINS() Element Access Performance Characteristics Array operations have different performance profiles depending on whether they need to scan the entire array or can access elements directly. Database Execution Understanding time complexity
- Array Cleanup
Real-world arrays often contain duplicates or arrive in unpredictable order. SQL provides functions to normalize arrays for consistent processing and analysis. Deduplication Sorting & Set Operations Sorted arrays enable consistent display and comparison. Note that string sorting is alphabetical, so sizes like S, M, L, XL will not sort in the expected order without custom logic. ARRAY_UNION/INTERSECT Combine arrays from multiple sources using set operations: With array operations and cleanup tech
- Map Operations
Maps store key-value pairs and require specialized functions to inspect and extract their contents. Unlike arrays where position matters, maps are accessed by key. Key & Value Extraction Use map_keys to audit schema consistency across rows, identify which optional attributes are populated, or build dynamic queries based on available properties. MAP_VALUES() Entry Processing This function is particularly useful when you need to process each key-value pair individually, such as pivoting map conten
- JSON_EXTRACT_SCALAR()
Understanding Scalar Extraction Basic Usage Practical Applications Type Considerations
- CAST JSON to Array/Map
Array Conversion JSON Array to SQL Array Map Conversion JSON Object to SQL Map Best Practices Choosing the correct target type ensures your conversions succeed and produce the expected results. Type Specifications When working with these type mappings in real queries, keep performance in mind.