Loading section...

Streaming Windows: When Data Doesn't Fit in Memory

Concepts: pyStreamingWindow, pyGeneratorPipeline, pyTimeEviction

In-memory sliding window implementations assume the entire dataset fits in memory. That's fine for LeetCode. In production data engineering, your event stream might be petabytes of log data, billions of user events, or a continuous Kafka topic with no end. The sliding window algorithm is the same, but the data structure discipline changes. You need a streaming implementation that holds only the window in memory -- not the full dataset. Generator-Based Sliding Windows A generator-based window reads one event at a time from a source (file, Kafka, socket), maintains only the active window state, and yields results as they're produced. Memory usage is O(window_size), not O(n). For a 7-day rolling window over a year of hourly data, that's 168 events in memory instead of 8,760. For a 7-day windo