Loading section...
Count-Min Sketch: Sub-Linear Frequency Estimation
Concepts: pyCountMinSketch, pyStreamFrequency, pyEpsilonDelta
Count-Min Sketch is the data structure that solves the frequency estimation problem at scale. You have a stream of billions of events. You cannot hold a Counter in memory. But you need to answer queries like: 'How many times has user_id 8675309 appeared in the last hour?' Count-Min Sketch answers this with a provable accuracy guarantee using O(1/epsilon * log(1/delta)) memory — which in practice is kilobytes, not gigabytes. This is the structure behind frequency tracking in Kafka Streams, Redis stream analytics, and network traffic monitoring at ISP scale. How Count-Min Sketch Works The critical property: the estimate NEVER undercounts. It may overcount by at most epsilon * total. So if total = 1,000,000 and epsilon = 0.01, the overcount is at most 10,000. For a frequency of 500,000, an er