Complex Data: Intermediate

Now that you understand how to access values in arrays, maps, and JSON, it is time to learn the functions that make these structures truly powerful. This lesson covers operations for measuring, searching, sorting, and transforming complex data types.

Data engineers frequently need to answer questions like: How many items are in this array? Does this map contain a specific key? What are all the unique values? These operations are fundamental to building analytics pipelines that process semi-structured data at scale.

Array Operations

Daily Life
Interviews

Search, sort, and modify array contents

Arrays in SQL come with a rich set of functions for inspection, access, and manipulation. These functions treat arrays as first-class data structures, enabling operations that would otherwise require complex joins or procedural code.

Inspection Functions

CARDINALITY(array) returns the number of elements in an array. This is essential for understanding data distribution and filtering based on collection size.

order_iditems
1001[widget, gadget, tool]
1002[widget]
1003[gadget, tool, device, sensor]
1SELECT
2 order_id,
3 CARDINALITY(items) AS item_count
4FROM orders
Result
order_iditem_count
10013
10021
10034

Use CARDINALITY to identify orders with many items, filter out empty arrays, or calculate averages across variable-length collections.

//

CONTAINS()

CONTAINS(array, element) returns true if the specified element exists in the array. This enables filtering rows based on array contents without expanding the array.

1SELECT
2 article_id,
3 title
4FROM articles
5WHERE CONTAINS(tags, 'featured')
Result
article_idtitle
101Getting Started with SQL
105Advanced Query Patterns

The CONTAINS function performs an efficient membership test. For large arrays or frequent lookups, consider whether the data model should use a separate table with proper indexing instead.

Element Access

ELEMENT_AT(array, index) returns the element at the specified position. Unlike bracket notation, ELEMENT_AT handles out-of-bounds indices gracefully by returning NULL.

1SELECT
2 order_id,
3 ELEMENT_AT(items, 1) AS first_item,
4 ELEMENT_AT(items, 2) AS second_item,
5 ELEMENT_AT(items, 10) AS tenth_item
6FROM orders
Result
order_idfirst_itemsecond_itemtenth_item
1001widgetgadgetNULL
1002widgetNULLNULL

Negative indices count from the end: ELEMENT_AT(array, -1) returns the last element. This is useful when you need the most recent item in a chronologically ordered array.

Access the first element at position 1 (SQL uses 1-based indexing).
1SELECT
2 ELEMENT_AT(tags, 1)
3FROM articles

Performance Characteristics

Array operations have different performance profiles depending on whether they need to scan the entire array or can access elements directly.
//

Database Execution

Understanding time complexity helps you choose the right function and avoid slow queries on large arrays.
CARDINALITYCONTAINSELEMENT_ATNULL
CARDINALITY
O(1) lookup
Constant time access
CONTAINS
O(n) scan
Checks every element
ELEMENT_AT
O(1) lookup
Direct index access
NULL
Safe returns
No errors on missing

> Complete this query to count the number of tags for each event.

SELECT
  event_id,
  ___(___) AS tag_count
FROM event_data
LENGTH
SIZE
tags
CARDINALITY

With CARDINALITY you can quickly measure any array. This pairs naturally with WHERE clauses to find rows where array size meets specific criteria.

Remember: CARDINALITY returns 0 for empty arrays and NULL for NULL arrays, so handle both cases in production queries.

Next you will learn how to clean up arrays by removing duplicates and sorting elements into a consistent order.

Array Cleanup

Daily Life
Interviews

Remove duplicates and nulls from arrays

Real-world arrays often contain duplicates or arrive in unpredictable order. SQL provides functions to normalize arrays for consistent processing and analysis.

Deduplication

ARRAY_DISTINCT(array) removes duplicate elements, returning an array with only unique values. Order is not guaranteed in the result.

user_idpage_views
u001[home, products, home, cart, home]
u002[login, home, products, products, checkout]
1SELECT
2 user_id,
3 ARRAY_DISTINCT(page_views) AS unique_pages,
4 CARDINALITY(
5 ARRAY_DISTINCT(page_views)
6 ) AS unique_count
7FROM sessions
Result
user_idunique_pagesunique_count
u001[home, products, cart]3
u002[login, home, products, checkout]4

Combining ARRAY_DISTINCT with CARDINALITY gives you the count of unique values, useful for metrics like unique page views per session or distinct products viewed.

Sorting & Set Operations

ARRAY_SORT(array) returns the array with elements in ascending order. For strings, this is lexicographic order. For numbers, it is numeric order.

1SELECT
2 product_id,
3 ARRAY_SORT(sizes) AS sorted_sizes
4FROM products
5WHERE product_id = 'shirt-001'
Result
product_idsorted_sizes
shirt-001[L, M, S, XL, XS]
Sorted arrays enable consistent display and comparison. Note that string sorting is alphabetical, so sizes like S, M, L, XL will not sort in the expected order without custom logic.
TIP
Chain array functions together: ARRAY_SORT(ARRAY_DISTINCT(tags)) first removes duplicates, then sorts the unique values. This is a common pattern for normalizing user-generated tags or categories.
//

ARRAY_UNION/INTERSECT

Combine arrays from multiple sources using set operations:
1SELECT
2 user_id,
3 ARRAY_UNION(
4 morning_tags,
5 evening_tags
6 ) AS all_tags,
7 ARRAY_INTERSECT(
8 morning_tags,
9 evening_tags
10 ) AS common_tags
11FROM user_activity
Result
user_idall_tagscommon_tags
U001[work, email, news, social][email]
U002[sports, music, video][music]

ARRAY_UNION merges two arrays and removes duplicates. ARRAY_INTERSECT returns only elements present in both arrays. These are the array equivalents of SQL UNION and INTERSECT operations.

> Complete this query to retrieve the second tag from each event by position.

SELECT
  event_id,
  ___(___, ___) AS second_tag
FROM event_data
tags
1
ELEMENT_AT
2
GET

ELEMENT_AT provides safe positional access. Unlike bracket notation in some systems, out-of-bounds indices return NULL rather than errors.

Combine ELEMENT_AT with COALESCE to provide default values when an index might not exist.

With array operations and cleanup techniques covered, you are ready to explore the map data structure and its specialized functions.

Map Operations

Daily Life
Interviews

Merge, filter, and transform map data

Maps store key-value pairs and require specialized functions to inspect and extract their contents. Unlike arrays where position matters, maps are accessed by key.

Key & Value Extraction

MAP_KEYS(map) returns an array containing all keys in the map. This is essential for discovering what properties exist in a flexible schema.

product_idattributes
p001{color=red, size=large, material=cotton}
p002{color=blue, weight=2kg}
p003{size=medium, origin=usa, organic=true}
1SELECT
2 product_id,
3 MAP_KEYS(attributes) AS attribute_names
4FROM products
Result
product_idattribute_names
p001[color, size, material]
p002[color, weight]
p003[size, origin, organic]
Use map_keys to audit schema consistency across rows, identify which optional attributes are populated, or build dynamic queries based on available properties.
//

MAP_VALUES()

MAP_VALUES(map) returns an array of all values in the map, in the same order as their corresponding keys from MAP_KEYS.

1SELECT
2 product_id,
3 MAP_VALUES(attributes) AS attribute_values
4FROM products
Result
product_idattribute_values
p001[red, large, cotton]
p002[blue, 2kg]
p003[medium, usa, true]

Combining MAP_KEYS and MAP_VALUES with array functions like CONTAINS lets you search maps for specific keys or values without knowing the complete schema.

MAP_KEYS returns all keys as an array for inspection or further processing.
1SELECT
2 MAP_KEYS(properties)
3FROM sessions

Entry Processing

MAP_ENTRIES(map) returns an array of key-value pair structures, enabling iteration over the entire map. Each entry contains a key and value field.

1SELECT
2 product_id,
3 MAP_ENTRIES(attributes) AS entries
4FROM products
Result
product_identries
P001[{key=color, value=red}, {key=size, value=L}]
P002[{key=color, value=blue}]
This function is particularly useful when you need to process each key-value pair individually, such as pivoting map contents into separate rows.

Performance & Best Practices

Maps provide efficient key-based lookups but have different performance characteristics than arrays.
//

Database Execution

The database uses hash-based access for map lookups, making them fast regardless of map size.
Map Performance Characteristics
  • map_keys and map_values iterate over map entries
  • Key lookup with bracket notation is O(1) hash access
  • Maps with many entries consume more memory than arrays
  • Missing keys return NULL, not errors
These characteristics make maps the go-to structure for user properties and configuration data at scale.

> Complete this query to find events that contain a specific tag value.

SELECT
  event_id
FROM event_data
WHERE ___(___, ___)
HAS
CONTAINS
'premium'
IN
tags

The CONTAINS function works on both arrays and the arrays returned by MAP_KEYS, making it versatile for membership checks across data structures.

Pair CONTAINS with MAP_KEYS to check for property existence, or with raw arrays for value membership tests.

Next you will see how JSON_EXTRACT_SCALAR simplifies filtering and joining on values embedded inside JSON documents.

JSON_EXTRACT_SCALAR()

Daily Life
Interviews

Get clean scalar values from JSON fields

While JSON_EXTRACT returns JSON-typed values, JSON_EXTRACT_SCALAR returns plain SQL strings. This makes it perfect for filtering, joining, and comparing JSON values without type conversion.

Understanding Scalar Extraction

Recall that JSON_EXTRACT preserves JSON formatting, including quotes around strings:

JSON_EXTRACT
  • Returns: "alice" (with quotes)
  • Type: JSON value
  • Comparison: Complex
  • Filtering: Needs escaping
JSON_EXTRACT_SCALAR
  • Returns: alice (plain)
  • Type: VARCHAR
  • Comparison: Simple
  • Filtering: Direct equals
//

Basic Usage

This example shows how to extract scalar values from JSON and filter rows based on those values.

event_idpayload
1{"user": "alice", "action": "login"}
2{"user": "bob", "action": "purchase"}
3{"user": "alice", "action": "logout"}
1SELECT
2 event_id,
3 JSON_EXTRACT_SCALAR(
4 payload,
5 '$.user'
6 ) AS user_name,
7 JSON_EXTRACT_SCALAR(
8 payload,
9 '$.action'
10 ) AS action
11FROM events
12WHERE JSON_EXTRACT_SCALAR(payload, '$.user') = 'alice'
Result
event_iduser_nameaction
1alicelogin
3alicelogout

The WHERE clause uses a simple string comparison without escaped quotes. This is much cleaner than working with JSON_EXTRACT.

Practical Applications

JSON_EXTRACT_SCALAR enables efficient filtering on JSON fields. Some query engines can even push these filters down to storage for better performance.

Filter rows by a JSON field value using direct string comparison.
1SELECT
2 *
3FROM events
4WHERE JSON_EXTRACT_SCALAR(data, '$.status') = 'active'

Type Considerations

JSON_EXTRACT_SCALAR only works with scalar JSON values: strings, numbers, and booleans. Attempting to extract an array or object returns NULL.

1SELECT
2 JSON_EXTRACT_SCALAR(
3 '{"a": "hello"}',
4 '$.a'
5 ) AS string_val,
6 JSON_EXTRACT_SCALAR(
7 '{"b": 42}',
8 '$.b'
9 ) AS number_val,
10 JSON_EXTRACT_SCALAR(
11 '{"c": [1,2,3]}',
12 '$.c'
13 ) AS array_val
Result
string_valnumber_valarray_val
hello42NULL

Numbers are returned as strings. If you need numeric operations, cast the result: CAST(json_extract_scalar(...) AS DOUBLE).

TIP
Use JSON_EXTRACT_SCALAR for scalar values you will filter, join, or compare. Use JSON_EXTRACT when you need to preserve JSON structure or extract complex types like arrays and objects.

> Complete this query to list all the property names stored in a map column.

SELECT
  event_id,
  ___(___) AS prop_keys
FROM event_data
MAP_VALUES
properties
MAP_KEYS
KEYS

MAP_KEYS reveals which properties each row contains, making it essential for auditing schema consistency across flexible data.

Use MAP_KEYS with CARDINALITY to count properties, or with CONTAINS to check for specific keys.

The final section covers CAST, which bridges JSON values to native SQL types so you can use all the functions learned in this lesson.

CAST JSON to Array/Map

Daily Life
Interviews

Convert raw JSON into queryable types

Converting JSON to native array or map types unlocks the full power of array and map functions. Use CAST to transform extracted JSON into structured SQL types.

Array Conversion

Converting JSON arrays to native SQL arrays unlocks the full set of array functions like CARDINALITY and CONTAINS.

//

JSON Array to SQL Array

Use CAST with ARRAY(type) to convert a JSON array into a native SQL array.

1SELECT
2 event_id,
3 CAST(
4 JSON_EXTRACT(payload, '$.tags')
5 AS ARRAY(VARCHAR)
6 ) AS tags_array,
7 CARDINALITY(
8 CAST(
9 JSON_EXTRACT(payload, '$.tags')
10 AS ARRAY(VARCHAR)
11 )
12 ) AS tag_count
13FROM events
Result
event_idtags_arraytag_count
1[mobile, ios]2
2[web, desktop, chrome]3

After casting, you can use CARDINALITY, CONTAINS, ELEMENT_AT, and all other array functions on the result.

Map Conversion

Converting JSON objects to native SQL maps enables efficient key-based lookups and map functions.

//

JSON Object to SQL Map

Use CAST with MAP(key_type, value_type) to convert a JSON object into a native SQL map.

1SELECT
2 session_id,
3 CAST(
4 JSON_EXTRACT(data, '$.properties')
5 AS MAP(VARCHAR, VARCHAR)
6 ) AS props,
7 MAP_KEYS(
8 CAST(
9 JSON_EXTRACT(data, '$.properties')
10 AS MAP(VARCHAR, VARCHAR)
11 )
12 ) AS prop_keys
13FROM sessions
Result
session_idpropsprop_keys
s001{browser=Chrome, os=Windows}[browser, os]

The target type MAP(VARCHAR, VARCHAR) specifies both key and value types. All keys must convert to the key type, and all values must convert to the value type.

Best Practices

Choosing the correct target type ensures your conversions succeed and produce the expected results.
//

Type Specifications

This reference shows common JSON structures and their corresponding SQL type specifications.

JSON StructureTarget Type
["a", "b", "c"]ARRAY(VARCHAR)
[1, 2, 3]ARRAY(INTEGER)
[1.5, 2.5]ARRAY(DOUBLE)
{"k": "v"}MAP(VARCHAR, VARCHAR)
{"k": 123}MAP(VARCHAR, INTEGER)
When working with these type mappings in real queries, keep performance in mind.
TIP
Avoid casting the same JSON value multiple times in a query. Use a WITH clause to cast once and reference the result by name. This reduces parsing overhead and improves readability.
1WITH parsed AS (
2 SELECT
3 event_id,
4 CAST(
5 JSON_EXTRACT(payload, '$.tags')
6 AS ARRAY(VARCHAR)
7 ) AS tags
8 FROM events
9)
10
11SELECT
12 event_id,
13 CARDINALITY(tags) AS count,
14 ELEMENT_AT(tags, 1) AS first_tag
15FROM parsed
Result
event_idcountfirst_tag
E0013purchase
E0022login
E0031signup
Do
  • Use json_extract_scalar for filtering and joins
  • Cast to native types when using array/map functions
  • Use WITH clauses to avoid repeated parsing
  • Check cardinality before accessing specific indices
  • Combine array_distinct with cardinality for unique counts
Don't
  • Compare JSON_EXTRACT results without handling quotes
  • Cast arrays without knowing the element type
  • Assume all rows have identical JSON structure
  • Use json_extract_scalar on arrays or objects
  • Ignore NULL handling for missing keys or paths

> Complete this query to cast a JSON tags field to a native SQL array and count its elements.

SELECT
  event_id,
  ___(
    ___(
      JSON_EXTRACT(payload, '$.tags')
      AS ___
      )
    ) AS tag_count
FROM event_data
CARDINALITY
LENGTH
ARRAY(VARCHAR)
ARRAY(INTEGER)
CAST

Casting JSON to native types is the bridge between semi-structured and structured data. Once cast, all array and map functions become available.

The WITH clause pattern of casting once and referencing many times is the recommended approach for production queries involving JSON conversion.

You now have a complete toolkit for working with arrays, maps, and JSON in SQL. These functions are fundamental to processing semi-structured data at scale.

PUTTING IT ALL TOGETHER

> You are building a user behavior report from a sessions table. Each row has a tags array of event categories, a properties map of session attributes, and a JSON payload column. You need to identify users who visited more than three unique pages, check whether they have a premium flag in their properties map, and filter sessions where the JSON action field equals "purchase".

ARRAY_DISTINCT + CARDINALITY together count unique page views per session without expanding rows
CONTAINS(map_keys(properties), 'premium') checks whether each session has a premium property defined
JSON_EXTRACT_SCALAR extracts the action field as a plain SQL string for direct comparison in WHERE
CAST converts the JSON-formatted tags value to ARRAY(VARCHAR) so CARDINALITY and CONTAINS work on it

The challenge combines all five techniques across a single events table with three semi-structured columns.

KEY TAKEAWAYS
CARDINALITY(array) returns the number of elements
CONTAINS(array, element) checks for membership
ELEMENT_AT(array, index) safely accesses elements with NULL for out-of-bounds
ARRAY_DISTINCT removes duplicates; ARRAY_SORT orders elements
MAP_KEYS and MAP_VALUES extract arrays of keys and values from maps
JSON_EXTRACT_SCALAR returns plain strings for filtering and joining
CAST JSON to ARRAY or MAP to use native array/map functions
Use WITH clauses to avoid parsing the same JSON multiple times

Arrays, maps, and finding needles

Category
SQL
Difficulty
intermediate
Duration
26 minutes
Challenges
0 hands-on challenges

Topics covered: Array Operations, Array Cleanup, Map Operations, JSON_EXTRACT_SCALAR(), CAST JSON to Array/Map

Lesson Sections

  1. Array Operations (concepts: sqlArrayOps)

    Arrays in SQL come with a rich set of functions for inspection, access, and manipulation. These functions treat arrays as first-class data structures, enabling operations that would otherwise require complex joins or procedural code. Inspection Functions CONTAINS() Element Access Performance Characteristics Array operations have different performance profiles depending on whether they need to scan the entire array or can access elements directly. Database Execution Understanding time complexity

  2. Array Cleanup

    Real-world arrays often contain duplicates or arrive in unpredictable order. SQL provides functions to normalize arrays for consistent processing and analysis. Deduplication Sorting & Set Operations Sorted arrays enable consistent display and comparison. Note that string sorting is alphabetical, so sizes like S, M, L, XL will not sort in the expected order without custom logic. ARRAY_UNION/INTERSECT Combine arrays from multiple sources using set operations: With array operations and cleanup tech

  3. Map Operations

    Maps store key-value pairs and require specialized functions to inspect and extract their contents. Unlike arrays where position matters, maps are accessed by key. Key & Value Extraction Use map_keys to audit schema consistency across rows, identify which optional attributes are populated, or build dynamic queries based on available properties. MAP_VALUES() Entry Processing This function is particularly useful when you need to process each key-value pair individually, such as pivoting map conten

  4. JSON_EXTRACT_SCALAR()

    Understanding Scalar Extraction Basic Usage Practical Applications Type Considerations

  5. CAST JSON to Array/Map

    Array Conversion JSON Array to SQL Array Map Conversion JSON Object to SQL Map Best Practices Choosing the correct target type ensures your conversions succeed and produce the expected results. Type Specifications When working with these type mappings in real queries, keep performance in mind.