Loading lesson...
From O(n^2) temporal joins to petabyte compaction — this is where intervals get serious.
From O(n^2) temporal joins to petabyte compaction — this is where intervals get serious.
Topics covered: Interval Scheduling with DAG Dependencies, Segment Trees: Dynamic Interval Queries at O(log n), Temporal Joins: From O(n^2) to O(n log n), Sessionization at Scale, Design: Time-Series Compaction System
The meeting rooms problem assumes each task can start at any valid time. Real pipeline scheduling adds dependency constraints: task B cannot start until task A finishes. You now have both interval constraints (task i occupies [start_i, end_i] time slots on a worker) and DAG constraints (topological ordering must be respected). Solving both simultaneously is a staff-level problem. The approach: topological sort first (to establish a valid execution order), then assign time intervals greedily whil
Prefix sums and sorted-list scans handle static and append-only workloads. But when you need O(log n) range updates AND O(log n) range queries on a dataset where both insertions and deletions happen continuously, you need a segment tree. The segment tree partitions the value space (not the data) into a binary tree of ranges, allowing you to update a range and query a range in O(log n) each. This is the data structure behind range max, range sum, and coverage queries in dynamic systems. Segment T
The temporal join is one of the most common and most expensive operations in data engineering. 'Join events from table A with events from table B that occurred within ±5 minutes.' The naive nested loop is O(n^2): for every event in A, scan all events in B and check the time condition. On tables with 10 million rows, that is 100 trillion comparisons. The sort + sweep approach brings this to O(n log n), and understanding WHY is a key staff-level interview signal. The Naive O(n^2) Problem Sort + Sw
Sessionization is one of the most common data engineering problems at consumer tech companies and one of the least well understood. A session is a group of user events that are 'close enough' in time. The boundary conditions: gap-based (session ends after 30 minutes of inactivity), activity-based (session ends after N events regardless of time), or hybrid (whichever comes first). Each has a different algorithm, different correctness guarantees, and critically, different behavior under late-arriv
This is the staff-level system design version of merge intervals. A time-series database accumulates write files (SSTables, segments, L0 files) continuously. Over time, these files overlap in time range: file A covers [1000, 2000], file B covers [1500, 2500], file C covers [800, 1200]. Reads that touch any of these time ranges must scan all overlapping files. As file count grows, read amplification explodes. Compaction merges overlapping files into fewer, non-overlapping ones — restoring read pe