Loading section...
How Would You Partition?
Concepts: paPartitioning, paSparkExecutionModel
The real question isn't "what should the partition key be?" - it's "the current partition scheme is wrong, how do you migrate 50 TB of live data without downtime?" Dynamic Partition Discovery Most catalogs use static partition registration - you explicitly add partitions via ALTER TABLE ADD PARTITION or MSCK REPAIR TABLE. At scale (100K+ partitions, dozens of pipelines writing concurrently), this breaks. Partition registration becomes a bottleneck, and stale metadata causes queries to miss data. Iceberg solves this with manifest-based partition tracking - no external metastore registration needed. Delta uses the _delta_log to discover partitions. Both eliminate the MSCK REPAIR antipattern. If you're still running Hive-style partitioning, migrating to Iceberg or Delta partition manage