Complex Data: Beginner
Modern analytics platforms routinely store data that does not fit neatly into traditional rows and columns. Event logs arrive as JSON documents. User preferences live in key-value maps. Product tags exist as arrays of strings. Understanding how to work with these complex data structures is essential for building robust data pipelines, which are automated workflows that move and transform data from source systems to analytical destinations.
This lesson introduces the foundational concepts for working with semi-structured data in SQL. You will learn to recognize JSON, arrays, and maps, understand when each structure is appropriate, and extract values from these nested formats using path-based operators.
JSON_EXTRACT()
Pull values from JSON columns
JSON_EXTRACT retrieves a value from a JSON document using a path expression. JSON, which stands for JavaScript Object Notation, is a text-based format for representing structured data. It has become the dominant format for API responses, event logs, and configuration files across the industry.
A JSON document consists of key-value pairs enclosed in curly braces. Keys are strings in double quotes, followed by a colon and a value. Values can be strings, numbers, booleans, arrays, nested objects, or the special value null.
Understanding JSON Structure
Consider this JSON document stored in a single database column:
This single JSON value contains a user ID (number), an event type (string), nested details (object with its own keys), and tags (array of strings). All of this lives in one column of one row.
Path Syntax & Navigation
The syntax is JSON_EXTRACT(json_column, path). The path uses dot notation starting with $ to represent the root. For example, '$.user_id' extracts the user_id field from the root level.
| event_id | payload |
|---|---|
| 1 | {"user": "alice", "action": "login"} |
| 2 | {"user": "bob", "action": "purchase"} |
| 3 | {"user": "carol", "action": "logout"} |
Notice that JSON_EXTRACT returns JSON-formatted values. String results include the surrounding double quotes. This preserves JSON semantics but requires additional processing if you need plain strings.
Nested Path Access
JSON documents often contain nested objects. Use additional dots to traverse deeper into the structure:
The path $.details.product first navigates to the details object, then extracts the product field within it. You can nest as deeply as your JSON structure requires.
Array Index Access
JSON_EXTRACT can also access array elements by index. Use bracket notation with the zero-based index:
Array indices in JSON paths are zero-based, meaning the first element is at index 0. Accessing an index that does not exist returns a JSON null value.
Best Practices
Knowing when to use JSON functions and how they execute helps you write efficient queries and structure your data appropriately.
When to Use JSON_EXTRACT
JSON_EXTRACT is most valuable when dealing with semi-structured data that varies in shape or comes from external systems.
As your pipeline matures, consider whether raw JSON flexibility is still worth the query cost.
JSON provides flexibility during rapid development, but frequently accessed fields should eventually be promoted to dedicated columns. This improves query performance and enables proper indexing.Database Execution
Understanding how the database processes JSON helps explain why certain patterns perform better than others.
- Parses the JSON string for each row at query time
- Path navigation traverses the parsed structure
- Returns JSON-typed values (strings include quotes)
- Missing paths return JSON null, not SQL NULL
JSON parsing has computational cost. For large tables with complex JSON documents, consider materializing frequently queried paths into dedicated columns during ETL (Extract, Transform, Load) processing.
> Complete this query to extract the user_id from a JSON payload column.
SELECT event_id, ( , ) AS user_id FROM event_data
JSON_EXTRACT vs JSON_EXTRACT_SCALAR
JSON_EXTRACT returns a JSON-typed value. Strings come back with surrounding quotes (e.g. "alice"). JSON_EXTRACT_SCALAR strips the quotes and returns a plain SQL string (e.g. alice). Use SCALAR when the value feeds into comparisons, joins, or display.
A practical rule: if the extracted value will be compared to a string literal or joined to another column, use JSON_EXTRACT_SCALAR. If you are passing the result directly to another JSON function, JSON_EXTRACT preserves the type information you need.
JSON Extraction Patterns
Navigate nested JSON structures
Understanding common JSON structures in analytics data helps you write more effective extraction queries. Most event-driven systems follow predictable patterns.
Common JSON Patterns
Note the WHERE clause comparison uses '"page_view"' with escaped quotes because JSON_EXTRACT returns JSON-formatted strings that include their surrounding quotes.
Handling Edge Cases
When a requested path does not exist, JSON_EXTRACT returns a JSON null. This differs from SQL NULL but can be treated similarly in most contexts.
Rows where the field does not exist show null (JSON null). Your downstream processing should handle these gracefully, either by filtering them out or providing default values.
How should you handle a JSON field that might not exist in every row?
> Complete this query to retrieve the first tag from an array column.
SELECT event_id, 1 AS first_tag FROM event_data
Path Patterns to Remember
Three path forms cover the majority of real-world JSON extraction: $.field for top-level keys, $.parent.child for nested objects, and $.array[0] for array elements (0-indexed in path syntax, 1-indexed with bracket notation).
When a path does not match, JSON_EXTRACT returns JSON null rather than SQL NULL. Wrapping with COALESCE handles missing paths gracefully, which matters for event data where not every row shares the same structure.
Arrays in SQL
Store and access ordered collections
In SQL, arrays are denoted with the ARRAY keyword and square brackets. Each element has a position (index), and elements are accessed using bracket notation. Unlike JSON arrays which are stored as text, native SQL arrays are stored in a binary format optimized for database operations.
Array Basics
Arrays can be created inline using the ARRAY constructor:
The ARRAY keyword followed by values in square brackets creates an array literal. Elements are separated by commas and can be of any data type, though all elements in an array must share the same type.
Accessing Array Elements
| article_id | tags |
|---|---|
| 101 | [tech, sql, tutorial] |
| 102 | [news, breaking, politics] |
| 103 | [review, product, featured] |
SQL array indices are typically 1-based, meaning the first element is at position 1. This differs from JSON paths (0-based) and most programming languages. Accessing an index beyond the array bounds returns NULL.
Practical Applications
> Complete this query to look up the browser value from a map column.
SELECT event_id, 'browser' AS browser FROM event_data
When to Use Arrays
Arrays work well when the number of values is variable but each value is the same type: tags on a post, items in a cart, permissions on a role. If the values have different types or each key has a distinct meaning, a map is usually the better fit.
Maps in SQL
Work with key-value pairs in columns
Map Basics
Maps can be created using the MAP constructor with alternating keys and values:
The MAP function takes two arrays: one of keys and one of values. Keys must be unique within a map. Both keys and values have specific types, and all keys must share one type while all values share another.
Accessing Map Values
| session_id | properties |
|---|---|
| s001 | {browser=Chrome, os=Windows, device=desktop} |
| s002 | {browser=Safari, os=macOS, device=laptop} |
| s003 | {browser=Firefox, os=Linux, device=desktop} |
The bracket notation properties['browser'] retrieves the value associated with the key 'browser'. If the key does not exist in the map, the result is NULL.
Comparison & Use Cases
Maps and JSON objects look similar but serve different purposes:
| Aspect | Map | JSON Object |
|---|---|---|
| Storage | Binary, optimized | Text string |
| Key Types | Any SQL type | Strings only |
| Access Speed | Fast O(1) | Requires parsing |
| Nested Objects | Not supported | Supported |
| Best For | Homogeneous key-value | Complex documents |
JSON when you have deeply nested structures or documents that vary significantly in shape.Map Use Cases
Maps excel when you need to store and retrieve named properties efficiently without the parsing overhead of JSON.
- Session properties and user attributes
- Product specifications with varying fields
- Configuration settings per tenant
- Metric labels in observability data
- Feature flags and experiment assignments
CAST for Complex Types
Convert between JSON and SQL types
CAST converts data from one type to another. When working with complex data structures, CAST enables conversion between JSON strings and native SQL arrays or maps, allowing you to leverage the best features of each format.
The syntax is CAST(expression AS target_type). For complex types, the target type includes the element type for arrays or the key and value types for maps.
Type Conversion Basics
Convert a JSON array string to a native SQL array for better query performance:
The type ARRAY(VARCHAR) specifies an array of strings. You must provide the element type. Common element types include VARCHAR, INTEGER, DOUBLE, and BOOLEAN.
JSON to Map
Convert a JSON object to a native SQL map:
The type MAP(VARCHAR, INTEGER) specifies string keys mapping to integer values. Both the key type and value type must be specified.
Practical Patterns
A common pattern extracts a JSON array field and converts it to a native array for further processing:
Best Practices
Casting has a computational cost, so avoid casting the same value multiple times. Consider using WITH clauses to cast once and reuse the result.
> Complete this query to convert a map value to an integer type.
SELECT (properties 'count' ) AS count_num FROM event_data
CAST bridges the gap between JSON text and native SQL types, enabling you to combine the flexibility of JSON storage with the performance of native operations.
- Use JSON_EXTRACT for ad-hoc exploration of JSON data
- Cast to native types when performance matters
- Document the expected structure of JSON columns
- Validate JSON structure before relying on paths
- Use arrays for ordered collections of same-type values
- Store frequently queried fields only in JSON
- Assume all rows have the same JSON structure
- Mix JSON extraction and native type operations
- Ignore type mismatches in array elements
- Use maps for deeply nested structures
> Your company's mobile app logs every user interaction as a JSON event in an events table. The payload column holds raw JSON, tags is a native SQL array of event categories, and properties is a key-value map of session attributes. The product team wants a report showing user IDs, primary event categories, and device types for all purchase events.
The challenge gives you an events table with all three column formats and asks you to combine extraction patterns in a single query.
JSON_EXTRACT uses path syntax ($.field.subfield) to retrieve valuesCAST converts between JSON strings and native array/map typesARRAY, MAP) are faster than JSON for repeated operationsJSON: chaos in a column, somehow useful
- Category
- SQL
- Difficulty
- beginner
- Duration
- 26 minutes
- Challenges
- 0 hands-on challenges
Topics covered: JSON_EXTRACT(), JSON Extraction Patterns, Arrays in SQL, Maps in SQL, CAST for Complex Types
Lesson Sections
- JSON_EXTRACT() (concepts: sqlJsonExtract)
Understanding JSON Structure Path Syntax & Navigation Nested Path Access Which path expression reaches the nested amount field? Array Index Access Best Practices When to Use JSON_EXTRACT Database Execution JSON_EXTRACT vs JSON_EXTRACT_SCALAR
- JSON Extraction Patterns
Common JSON Patterns Event logs typically have a fixed outer structure with a flexible payload: Handling Edge Cases Path Patterns to Remember
- Arrays in SQL (concepts: sqlArrayBasic)
An array is an ordered collection of values stored in a single column. Arrays are particularly useful when a row has a variable number of related values, such as tags on an article, items in an order, or permissions assigned to a user. Array Basics Accessing Array Elements Use bracket notation with a 1-based index to access individual elements: What happens when you access an array index that does not exist? Practical Applications Arrays solve the problem of storing variable-length related data
- Maps in SQL
A map is a collection of key-value pairs where each key maps to exactly one value. Maps are also called dictionaries or associative arrays in other contexts. In SQL, maps provide a way to store flexible, named attributes without defining a fixed schema. Maps are particularly valuable for storing metadata, configuration settings, or properties that vary between rows. Instead of adding nullable columns for every possible attribute, you store them all in a single map column. Map Basics Accessing Ma
- CAST for Complex Types
Type Conversion Basics JSON to Map Practical Patterns Best Practices When building data pipelines, cast early and once. Use a CTE or subquery to convert complex types at the start of your query, then reference the native types throughout the rest of your logic.