Nested Data
The Nesting Decision
Decide when to embed vs. normalize
When to Nest vs When to Normalize
Decision Framework
- Always read with parent
- Low cardinality (1-10 items)
- Rarely queried independently
- Preserves source JSON structure
- Eliminates JOINs for common queries
- Queried independently of parent
- High cardinality (100+ items)
- Shared across multiple tables
- Changes independently of parent
- Needs FK constraints and indexing
A common mistake: nesting everything because the source data is JSON. Just because the API returns nested JSON does not mean your analytical table should be nested. If you frequently filter or JOIN on the nested fields, normalize them.
STRUCT: Embedded Objects
Group related fields into one column
Grouping Related Fields Into One Column
| Approach | Columns | Query |
|---|---|---|
| Flat columns | address_line1, address_city, address_state, address_zip | WHERE address_city = 'Seattle' |
| STRUCT column | address STRUCT<line1, city, state, zip> | WHERE address.city = 'Seattle' |
Schema evolution trap in Snowflake: VARIANT columns (Snowflake's semi-structured type) accept any structure but cannot be indexed or used for partition pruning. If you query a VARIANT field frequently, extract it into a typed column for performance.
ARRAY: Ordered Collections
Store lists of values inside a row
Storing Lists Inside a Row
The UNNEST Trap
| product_id | price | tag (after UNNEST) |
|---|---|---|
| P1 | $100 | electronics |
| P1 | $100 | sale |
| P1 | $100 | clearance |
- If you UNNEST just to filter (WHERE 'sale' IN UNNEST(tags)), aggregate BEFORE or use EXISTS instead.
- Use ARRAY_LENGTH(tags) instead of UNNEST + COUNT(*). No row multiplication.
- If you must UNNEST and aggregate, GROUP BY the original row key first to avoid inflating metrics from other columns.
The CROSS JOIN UNNEST pattern drops rows where the array is NULL or empty. Use LEFT JOIN UNNEST to preserve rows with empty arrays. This is the second most common UNNEST bug after fan-out inflation.
MAP: Dynamic Key-Value Pairs
Handle flexible attributes per record
Flexible Attributes Without Schema Changes
The Tradeoff
- Flexible: any keys, no schema change
- Good for sparse, varying attributes
- No column explosion for 100+ optional fields
- Cannot be indexed or used for partition pruning
- Fixed: schema change to add a field
- Good for frequently-queried attributes
- Can be indexed and used for partitioning
- Column explosion if many optional fields
Columnar Storage & Nesting
Optimize nested data for query engines
How Nested Data Interacts with Columnar Engines
Performance Implications
Best Practices
> You are modeling e-commerce order data in BigQuery. Each order has line items, each with a product reference and a list of applied discounts.
When flat tables meet nested reality
- Category
- Data Modeling
- Duration
- 15 minutes
- Challenges
- 12 hands-on challenges
Topics covered: The Nesting Decision, STRUCT: Embedded Objects, ARRAY: Ordered Collections, MAP: Dynamic Key-Value Pairs, Columnar Storage & Nesting
Lesson Sections
- The Nesting Decision
When to Nest vs When to Normalize The fundamental question: should related data live inside the parent row (nested) or in a separate table (normalized)? Both are valid. The choice depends on how the data is queried, how often it changes, and whether the nested data needs to be joined to independently. Nesting works well when the child data is always read alongside the parent. A user's shipping addresses are almost always retrieved when loading the user record. Normalizing this into a separate ad
- STRUCT: Embedded Objects (concepts: dmStructType)
Grouping Related Fields Into One Column A STRUCT is a named group of fields embedded inside a row. Instead of storing address_line1, address_city, address_state, address_zip as four separate columns, you store them as one address STRUCT with four sub-fields. Accessing them uses dot notation: address.city. STRUCTs do not add rows. They add structure within a row. The fact table row count stays the same. This is the key difference from ARRAY (which can add logical rows via UNNEST). Functionally, b
- ARRAY: Ordered Collections (concepts: dmArrayType)
Storing Lists Inside a Row An ARRAY column stores an ordered list of values inside a single row. A product can have an array of tags: ['electronics', 'sale', 'clearance']. An order can have an array of line item STRUCTs. ARRAYs eliminate the need for a separate junction or child table for simple one-to-many relationships. To query individual elements of an ARRAY, you use UNNEST. UNNEST flattens the array into rows, producing one output row per array element. This is powerful but dangerous: if yo
- MAP: Dynamic Key-Value Pairs (concepts: dmMapType)
Flexible Attributes Without Schema Changes A MAP column stores key-value pairs where the keys are not predefined in the schema. A user preferences column might contain {'theme': 'dark', 'language': 'en', 'notifications': 'off'}. Each user can have different keys. Adding a new preference does not require ALTER TABLE. MAPs are the escape hatch for data that does not have a fixed schema. Event properties, configuration settings, A/B test parameters, custom metadata. Anything where the set of keys v
- Columnar Storage & Nesting (concepts: dmColumnarStorage)
How Nested Data Interacts with Columnar Engines Modern analytical databases (BigQuery, Snowflake, Databricks) use columnar storage: data is stored column-by-column, not row-by-row. A query that reads only 3 columns out of 50 only scans those 3 columns. This is why analytical queries are fast. But nested data adds complexity. STRUCT sub-fields are stored as individual columns in columnar format. address.city is stored alongside address.state and address.zip as separate column chunks. Accessing on