Complex Data: Beginner

Modern analytics platforms routinely store data that does not fit neatly into traditional rows and columns. Event logs arrive as JSON documents. User preferences live in key-value maps. Product tags exist as arrays of strings. Understanding how to work with these complex data structures is essential for building robust data pipelines, which are automated workflows that move and transform data from source systems to analytical destinations.

This lesson introduces the foundational concepts for working with semi-structured data in SQL. You will learn to recognize JSON, arrays, and maps, understand when each structure is appropriate, and extract values from these nested formats using path-based operators.

JSON_EXTRACT()

Daily Life
Interviews

Pull values from JSON columns

JSON_EXTRACT retrieves a value from a JSON document using a path expression. JSON, which stands for JavaScript Object Notation, is a text-based format for representing structured data. It has become the dominant format for API responses, event logs, and configuration files across the industry.

A JSON document consists of key-value pairs enclosed in curly braces. Keys are strings in double quotes, followed by a colon and a value. Values can be strings, numbers, booleans, arrays, nested objects, or the special value null.

Understanding JSON Structure

Consider this JSON document stored in a single database column:

1{ 'user_id' : 12345, 'event_type' : 'purchase', 'details' : { 'product' : 'Premium Plan', 'amount' : 99.99 }, 'tags' : [ 'mobile', 'ios', 'returning' ] }

This single JSON value contains a user ID (number), an event type (string), nested details (object with its own keys), and tags (array of strings). All of this lives in one column of one row.

Path Syntax & Navigation

The syntax is JSON_EXTRACT(json_column, path). The path uses dot notation starting with $ to represent the root. For example, '$.user_id' extracts the user_id field from the root level.

event_idpayload
1{"user": "alice", "action": "login"}
2{"user": "bob", "action": "purchase"}
3{"user": "carol", "action": "logout"}
1SELECT
2 event_id,
3 JSON_EXTRACT(payload, '$.user') AS user_name,
4 JSON_EXTRACT(payload, '$.action') AS action_type
5FROM events
Result
event_iduser_nameaction_type
1"alice""login"
2"bob""purchase"
3"carol""logout"

Notice that JSON_EXTRACT returns JSON-formatted values. String results include the surrounding double quotes. This preserves JSON semantics but requires additional processing if you need plain strings.

//

Nested Path Access

JSON documents often contain nested objects. Use additional dots to traverse deeper into the structure:

1SELECT
2 JSON_EXTRACT(
3 payload,
4 '$.details.product'
5 ) AS product,
6 JSON_EXTRACT(
7 payload,
8 '$.details.amount'
9 ) AS amount
10FROM transactions
Result
productamount
"Premium Plan"99.99
"Basic Plan"29.99

The path $.details.product first navigates to the details object, then extracts the product field within it. You can nest as deeply as your JSON structure requires.

Which path expression reaches the nested amount field?
This only checks the root level. If amount is nested inside details, this path returns null.
1SELECT
2 JSON_EXTRACT(payload, '$.amount')
3FROM transactions
//

Array Index Access

JSON_EXTRACT can also access array elements by index. Use bracket notation with the zero-based index:

1SELECT
2 JSON_EXTRACT(payload, '$.tags[0]') AS first_tag,
3 JSON_EXTRACT(payload, '$.tags[1]') AS second_tag
4FROM events
Result
first_tagsecond_tag
"mobile""ios"
"web""chrome"

Array indices in JSON paths are zero-based, meaning the first element is at index 0. Accessing an index that does not exist returns a JSON null value.

Extracting a root-level field: $.user_id accesses user_id directly from the JSON root.
1SELECT
2 JSON_EXTRACT(data, '$.user_id')
3FROM events

Best Practices

Knowing when to use JSON functions and how they execute helps you write efficient queries and structure your data appropriately.

//

When to Use JSON_EXTRACT

JSON_EXTRACT is most valuable when dealing with semi-structured data that varies in shape or comes from external systems.

Event LogsAPI ResponsesConfig DataTelemetryWebhooks
Event Logs
Streaming data
Ingest raw event JSON
API Responses
External data
Parse vendor payloads
Config Data
App settings
Flexible key-value pairs
Telemetry
Device metrics
Sensor and app signals
Webhooks
Inbound hooks
Third-party callbacks

As your pipeline matures, consider whether raw JSON flexibility is still worth the query cost.

TIP
Storing data as JSON provides flexibility during rapid development, but frequently accessed fields should eventually be promoted to dedicated columns. This improves query performance and enables proper indexing.
//

Database Execution

Understanding how the database processes JSON helps explain why certain patterns perform better than others.

How JSON_EXTRACT Works Under the Hood
  • Parses the JSON string for each row at query time
  • Path navigation traverses the parsed structure
  • Returns JSON-typed values (strings include quotes)
  • Missing paths return JSON null, not SQL NULL

JSON parsing has computational cost. For large tables with complex JSON documents, consider materializing frequently queried paths into dedicated columns during ETL (Extract, Transform, Load) processing.

> Complete this query to extract the user_id from a JSON payload column.

SELECT
  event_id,
  ___(
    ___,
    ___
    ) AS user_id
FROM event_data
payload
'$.user_id'
JSON_EXTRACT_SCALAR
JSON_EXTRACT
'$.id'

JSON_EXTRACT vs JSON_EXTRACT_SCALAR

JSON_EXTRACT returns a JSON-typed value. Strings come back with surrounding quotes (e.g. "alice"). JSON_EXTRACT_SCALAR strips the quotes and returns a plain SQL string (e.g. alice). Use SCALAR when the value feeds into comparisons, joins, or display.

A practical rule: if the extracted value will be compared to a string literal or joined to another column, use JSON_EXTRACT_SCALAR. If you are passing the result directly to another JSON function, JSON_EXTRACT preserves the type information you need.

JSON Extraction Patterns

Daily Life
Interviews

Navigate nested JSON structures

Understanding common JSON structures in analytics data helps you write more effective extraction queries. Most event-driven systems follow predictable patterns.

Common JSON Patterns

Event logs typically have a fixed outer structure with a flexible payload:
1{ 'timestamp' : '2024-01-15T10:30:00Z', 'event_name' : 'page_view', 'properties' : { 'page' : '/products/widget', 'referrer' : 'bing.com', 'duration_ms' : 4500 } }
1SELECT
2 JSON_EXTRACT(
3 event_data,
4 '$.event_name'
5 ) AS event,
6 JSON_EXTRACT(
7 event_data,
8 '$.properties.page'
9 ) AS page,
10 JSON_EXTRACT(
11 event_data,
12 '$.properties.duration_ms'
13 ) AS duration
14FROM event_logs
15WHERE JSON_EXTRACT(event_data, '$.event_name') = '"page_view"'
Result
eventpageduration
"page_view""/products/widget"4500

Note the WHERE clause comparison uses '"page_view"' with escaped quotes because JSON_EXTRACT returns JSON-formatted strings that include their surrounding quotes.

Handling Edge Cases

When a requested path does not exist, JSON_EXTRACT returns a JSON null. This differs from SQL NULL but can be treated similarly in most contexts.

1SELECT
2 event_id,
3 JSON_EXTRACT(
4 payload,
5 '$.optional_field'
6 ) AS optional_value
7FROM events
Result
event_idoptional_value
1"exists"
2null
3null

Rows where the field does not exist show null (JSON null). Your downstream processing should handle these gracefully, either by filtering them out or providing default values.

How should you handle a JSON field that might not exist in every row?

Without protection, missing fields return null, which can cause issues in downstream aggregations or joins.
1SELECT
2 JSON_EXTRACT(data, '$.optional')
3FROM events

> Complete this query to retrieve the first tag from an array column.

SELECT
  event_id,
  ___ 1 AS first_tag
FROM event_data
[0]
tags
[1]
ELEMENT_AT

Path Patterns to Remember

Three path forms cover the majority of real-world JSON extraction: $.field for top-level keys, $.parent.child for nested objects, and $.array[0] for array elements (0-indexed in path syntax, 1-indexed with bracket notation).

When a path does not match, JSON_EXTRACT returns JSON null rather than SQL NULL. Wrapping with COALESCE handles missing paths gracefully, which matters for event data where not every row shares the same structure.

Arrays in SQL

Daily Life
Interviews

Store and access ordered collections

An array is an ordered collection of values stored in a single column. Arrays are particularly useful when a row has a variable number of related values, such as tags on an article, items in an order, or permissions assigned to a user.

In SQL, arrays are denoted with the ARRAY keyword and square brackets. Each element has a position (index), and elements are accessed using bracket notation. Unlike JSON arrays which are stored as text, native SQL arrays are stored in a binary format optimized for database operations.

Array Basics

Arrays can be created inline using the ARRAY constructor:

1SELECT
2 ARRAY['red', 'green', 'blue'] AS colors
3FROM system_info
4LIMIT 1
Result
colors
[red, green, blue]

The ARRAY keyword followed by values in square brackets creates an array literal. Elements are separated by commas and can be of any data type, though all elements in an array must share the same type.

//

Accessing Array Elements

Use bracket notation with a 1-based index to access individual elements:
article_idtags
101[tech, sql, tutorial]
102[news, breaking, politics]
103[review, product, featured]
1SELECT
2 article_id,
3 tags 1 AS first_tag,
4 tags 2 AS second_tag
5FROM articles
Result
article_idfirst_tagsecond_tag
101techsql
102newsbreaking
103reviewproduct

SQL array indices are typically 1-based, meaning the first element is at position 1. This differs from JSON paths (0-based) and most programming languages. Accessing an index beyond the array bounds returns NULL.

What happens when you access an array index that does not exist?
Accessing a valid index returns the element at that position. Remember that SQL arrays use 1-based indexing.
1SELECT
2 tags 1 AS first_tag
3FROM articles

Practical Applications

Arrays solve the problem of storing variable-length related data without creating separate rows. Instead of a separate tags table with one row per tag per article, you store all tags in a single array column.
Product tags
Product tags
Categories or labels for products.
User permissions
User permissions
Role assignments for access control.
Order line items
Order line items
Simple orders with multiple products.
Event properties
Event properties
Multiple values for analytics events.
Search keywords
Search keywords
Filter selections and search terms.

> Complete this query to look up the browser value from a map column.

SELECT
  event_id,
  ___ 'browser' AS browser
FROM event_data
MAP_KEYS
['browser']
.browser
properties

When to Use Arrays

Arrays work well when the number of values is variable but each value is the same type: tags on a post, items in a cart, permissions on a role. If the values have different types or each key has a distinct meaning, a map is usually the better fit.

Array columns are common in event-driven systems where a single event can carry multiple values of the same kind. Storing them as an array avoids the need for a separate child table and keeps related data in one row.

Maps in SQL

Daily Life
Interviews

Work with key-value pairs in columns

A map is a collection of key-value pairs where each key maps to exactly one value. Maps are also called dictionaries or associative arrays in other contexts. In SQL, maps provide a way to store flexible, named attributes without defining a fixed schema.
Maps are particularly valuable for storing metadata, configuration settings, or properties that vary between rows. Instead of adding nullable columns for every possible attribute, you store them all in a single map column.

Map Basics

Maps can be created using the MAP constructor with alternating keys and values:

1SELECT
2 MAP(
3 ARRAY['color', 'size', 'price'],
4 ARRAY['red', 'large', '29.99']
5 ) AS product_attrs
6FROM system_info
7LIMIT 1
Result
product_attrs
{color=red, size=large, price=29.99}

The MAP function takes two arrays: one of keys and one of values. Keys must be unique within a map. Both keys and values have specific types, and all keys must share one type while all values share another.

//

Accessing Map Values

Retrieve values from a map using bracket notation with the key:
session_idproperties
s001{browser=Chrome, os=Windows, device=desktop}
s002{browser=Safari, os=macOS, device=laptop}
s003{browser=Firefox, os=Linux, device=desktop}
1SELECT
2 session_id,
3 properties 'browser' AS browser,
4 properties 'os' AS operating_system
5FROM sessions
Result
session_idbrowseroperating_system
s001ChromeWindows
s002SafarimacOS
s003FirefoxLinux

The bracket notation properties['browser'] retrieves the value associated with the key 'browser'. If the key does not exist in the map, the result is NULL.

Comparison & Use Cases

Maps and JSON objects look similar but serve different purposes:

AspectMapJSON Object
StorageBinary, optimizedText string
Key TypesAny SQL typeStrings only
Access SpeedFast O(1)Requires parsing
Nested ObjectsNot supportedSupported
Best ForHomogeneous key-valueComplex documents
TIP
Choose maps when all keys represent the same kind of thing (like all being property names) and you need fast access. Choose JSON when you have deeply nested structures or documents that vary significantly in shape.
//

Map Use Cases

Maps excel when you need to store and retrieve named properties efficiently without the parsing overhead of JSON.

Common Map Use Cases
  • Session properties and user attributes
  • Product specifications with varying fields
  • Configuration settings per tenant
  • Metric labels in observability data
  • Feature flags and experiment assignments

CAST for Complex Types

Daily Life
Interviews

Convert between JSON and SQL types

CAST converts data from one type to another. When working with complex data structures, CAST enables conversion between JSON strings and native SQL arrays or maps, allowing you to leverage the best features of each format.

The syntax is CAST(expression AS target_type). For complex types, the target type includes the element type for arrays or the key and value types for maps.

Type Conversion Basics

Convert a JSON array string to a native SQL array for better query performance:

1SELECT
2 CAST(
3 '["red", "green", "blue"]'
4 AS ARRAY(VARCHAR)
5 ) AS colors
6FROM system_info
7LIMIT 1
Result
colors
[red, green, blue]

The type ARRAY(VARCHAR) specifies an array of strings. You must provide the element type. Common element types include VARCHAR, INTEGER, DOUBLE, and BOOLEAN.

//

JSON to Map

Convert a JSON object to a native SQL map:

1SELECT
2 CAST(
3 '{"a": 1, "b": 2, "c": 3}'
4 AS MAP(VARCHAR, INTEGER)
5 ) AS letter_values
6FROM system_info
7LIMIT 1
Result
letter_values
{a=1, b=2, c=3}

The type MAP(VARCHAR, INTEGER) specifies string keys mapping to integer values. Both the key type and value type must be specified.

Practical Patterns

A common pattern extracts a JSON array field and converts it to a native array for further processing:

Step 1: Extract the JSON array as a JSON value (still a string).
1SELECT
2 JSON_EXTRACT(data, '$.tags')
3FROM event_data

Best Practices

Casting has a computational cost, so avoid casting the same value multiple times. Consider using WITH clauses to cast once and reuse the result.

> Complete this query to convert a map value to an integer type.

SELECT
  ___(properties 'count' ___ ___) AS count_num
FROM event_data
CONVERT
CAST
VARCHAR
AS
INTEGER

CAST bridges the gap between JSON text and native SQL types, enabling you to combine the flexibility of JSON storage with the performance of native operations.

When building data pipelines, cast early and once. Use a CTE or subquery to convert complex types at the start of your query, then reference the native types throughout the rest of your logic.
Do
  • Use JSON_EXTRACT for ad-hoc exploration of JSON data
  • Cast to native types when performance matters
  • Document the expected structure of JSON columns
  • Validate JSON structure before relying on paths
  • Use arrays for ordered collections of same-type values
Don't
  • Store frequently queried fields only in JSON
  • Assume all rows have the same JSON structure
  • Mix JSON extraction and native type operations
  • Ignore type mismatches in array elements
  • Use maps for deeply nested structures
PUTTING IT ALL TOGETHER

> Your company's mobile app logs every user interaction as a JSON event in an events table. The payload column holds raw JSON, tags is a native SQL array of event categories, and properties is a key-value map of session attributes. The product team wants a report showing user IDs, primary event categories, and device types for all purchase events.

JSON_EXTRACT_SCALAR pulls user_id from the JSON payload as a plain SQL string, ready to filter or join against a users table
Array bracket notation (tags[1]) retrieves the primary category using 1-based indexing, without scanning the whole array
Map bracket notation (properties['device']) fetches the device type directly by key, returning NULL if absent
CAST converts a JSON-formatted tags value to ARRAY(VARCHAR) when you need CARDINALITY or CONTAINS on it

The challenge gives you an events table with all three column formats and asks you to combine extraction patterns in a single query.

KEY TAKEAWAYS
JSON is a text format for semi-structured data with nested objects and arrays
JSON_EXTRACT uses path syntax ($.field.subfield) to retrieve values
Arrays store ordered collections; access elements with 1-based indices
Maps store key-value pairs; access values using bracket notation with keys
CAST converts between JSON strings and native array/map types
Native types (ARRAY, MAP) are faster than JSON for repeated operations
JSON is flexible but requires parsing; native types are optimized but rigid
Choose the right structure based on access patterns and data variability

JSON: chaos in a column, somehow useful

Category
SQL
Difficulty
beginner
Duration
26 minutes
Challenges
0 hands-on challenges

Topics covered: JSON_EXTRACT(), JSON Extraction Patterns, Arrays in SQL, Maps in SQL, CAST for Complex Types

Lesson Sections

  1. JSON_EXTRACT() (concepts: sqlJsonExtract)

    Understanding JSON Structure Path Syntax & Navigation Nested Path Access Which path expression reaches the nested amount field? Array Index Access Best Practices When to Use JSON_EXTRACT Database Execution JSON_EXTRACT vs JSON_EXTRACT_SCALAR

  2. JSON Extraction Patterns

    Common JSON Patterns Event logs typically have a fixed outer structure with a flexible payload: Handling Edge Cases Path Patterns to Remember

  3. Arrays in SQL (concepts: sqlArrayBasic)

    An array is an ordered collection of values stored in a single column. Arrays are particularly useful when a row has a variable number of related values, such as tags on an article, items in an order, or permissions assigned to a user. Array Basics Accessing Array Elements Use bracket notation with a 1-based index to access individual elements: What happens when you access an array index that does not exist? Practical Applications Arrays solve the problem of storing variable-length related data

  4. Maps in SQL

    A map is a collection of key-value pairs where each key maps to exactly one value. Maps are also called dictionaries or associative arrays in other contexts. In SQL, maps provide a way to store flexible, named attributes without defining a fixed schema. Maps are particularly valuable for storing metadata, configuration settings, or properties that vary between rows. Instead of adding nullable columns for every possible attribute, you store them all in a single map column. Map Basics Accessing Ma

  5. CAST for Complex Types

    Type Conversion Basics JSON to Map Practical Patterns Best Practices When building data pipelines, cast early and once. Use a CTE or subquery to convert complex types at the start of your query, then reference the native types throughout the rest of your logic.