Loading section...

Why Parquet?

Concepts: paColumnarVsRow, paCompression

This is asked as a screener because it instantly reveals whether you've worked with production data at scale. The interviewer doesn't want "it's columnar." They want you to connect physical layout to the queries you actually run. Row vs. Columnar Layout CSV and JSON store data row-by-row. To answer "what's the average order amount?" on a 500-column table, a row-oriented reader must load all 500 columns into memory, skip 499 of them, and aggregate the one it needs. Parquet stores each column contiguously on disk. That same query reads exactly one column - roughly 0.2% of the total bytes for a wide table. Compression and Encoding Parquet compresses well because columns contain homogeneous data. A status column with 5 distinct values across 100M rows uses dictionary encoding - each value