Loading section...

Columnar Storage & Nesting

Concepts: dmColumnarStorage

How Nested Data Interacts with Columnar Engines Modern analytical databases (BigQuery, Snowflake, Databricks) use columnar storage: data is stored column-by-column, not row-by-row. A query that reads only 3 columns out of 50 only scans those 3 columns. This is why analytical queries are fast. But nested data adds complexity. STRUCT sub-fields are stored as individual columns in columnar format. address.city is stored alongside address.state and address.zip as separate column chunks. Accessing one sub-field does not read the others. This is efficient. ARRAY elements are stored using repetition and definition levels (Parquet's Dremel encoding). This is more complex than flat columns. Deeply nested arrays (arrays of structs of arrays) create encoding overhead that degrades scan performance. P