Loading lesson...
Row group statistics, bloom filters, and the small file problem that ate the cluster.
Row group statistics, bloom filters, and the small file problem that ate the cluster.
Topics covered: Parquet Column Encoding: Beyond 'It Uses Compression', Bloom Filters and Late Materialization, Delta Lake vs Apache Iceberg: The Real Differences, The Small File Problem: Diagnosis and Compaction, Designing a File Format Strategy for a New Data Platform