Loading section...
APPROX_DISTINCT (HyperLogLog)
Concepts: sqlApproxAgg
The Scaling Problem Exact distinct counts become expensive as data grows. Understanding the cost helps you choose the right approach. COUNT(DISTINCT) Problem To count distinct values exactly, the database must track every unique value it has seen. This requires building a hash set in memory that grows with cardinality. For a table with 1 billion rows and 100 million unique values, that hash set can consume gigabytes of memory. How APPROX_DISTINCT Works The approximate counts are within 0.3% of exact in this example. In practice, HyperLogLog provides about 2% standard error, meaning 95% of estimates are within 4% of the true value. Complexity Comparison Choosing the Right Approach The choice between exact and approximate counts depends on your use case. Consider accuracy requirements and pe