Loading lesson...
When flat tables meet nested reality
When flat tables meet nested reality
Topics covered: The Nesting Decision, STRUCT: Embedded Objects, ARRAY: Ordered Collections, MAP: Dynamic Key-Value Pairs, Columnar Storage & Nesting
When to Nest vs When to Normalize The fundamental question: should related data live inside the parent row (nested) or in a separate table (normalized)? Both are valid. The choice depends on how the data is queried, how often it changes, and whether the nested data needs to be joined to independently. Nesting works well when the child data is always read alongside the parent. A user's shipping addresses are almost always retrieved when loading the user record. Normalizing this into a separate ad
Grouping Related Fields Into One Column A STRUCT is a named group of fields embedded inside a row. Instead of storing address_line1, address_city, address_state, address_zip as four separate columns, you store them as one address STRUCT with four sub-fields. Accessing them uses dot notation: address.city. STRUCTs do not add rows. They add structure within a row. The fact table row count stays the same. This is the key difference from ARRAY (which can add logical rows via UNNEST). Functionally, b
Storing Lists Inside a Row An ARRAY column stores an ordered list of values inside a single row. A product can have an array of tags: ['electronics', 'sale', 'clearance']. An order can have an array of line item STRUCTs. ARRAYs eliminate the need for a separate junction or child table for simple one-to-many relationships. To query individual elements of an ARRAY, you use UNNEST. UNNEST flattens the array into rows, producing one output row per array element. This is powerful but dangerous: if yo
Flexible Attributes Without Schema Changes A MAP column stores key-value pairs where the keys are not predefined in the schema. A user preferences column might contain {'theme': 'dark', 'language': 'en', 'notifications': 'off'}. Each user can have different keys. Adding a new preference does not require ALTER TABLE. MAPs are the escape hatch for data that does not have a fixed schema. Event properties, configuration settings, A/B test parameters, custom metadata. Anything where the set of keys v
How Nested Data Interacts with Columnar Engines Modern analytical databases (BigQuery, Snowflake, Databricks) use columnar storage: data is stored column-by-column, not row-by-row. A query that reads only 3 columns out of 50 only scans those 3 columns. This is why analytical queries are fast. But nested data adds complexity. STRUCT sub-fields are stored as individual columns in columnar format. address.city is stored alongside address.state and address.zip as separate column chunks. Accessing on