Design: Time-Series Compaction System

Concepts: pyCompaction, pyTombstones, pyMergeAmplification, pyLSMTree

This is the staff-level system design version of merge intervals. A time-series database accumulates write files (SSTables, segments, L0 files) continuously. Over time, these files overlap in time range: file A covers [1000, 2000], file B covers [1500, 2500], file C covers [800, 1200]. Reads that touch any of these time ranges must scan all overlapping files. As file count grows, read amplification explodes. Compaction merges overlapping files into fewer, non-overlapping ones — restoring read performance. This is merge intervals run continuously as a background process on petabyte-scale data. Compaction Policies Tombstones and Merge Amplification Two concepts the interviewer wants to hear you mention. Tombstones: in LSM-tree databases (Cassandra, RocksDB, InfluxDB), deletes are not in-plac