Loading section...
Frequency of Frequencies for Data Quality
Concepts: pyFreqOfFreq, pyDataQuality, pySingleton
Frequency of frequencies is a two-level counting pattern: first count how often each element appears, then count how often each frequency appears. The result is a histogram of counts. Counter(Counter(data).values()) is one of the most powerful one-liners for data quality analysis. It answers questions like: 'How many columns have exactly zero NULLs?' 'How many keys appear exactly once (singletons) in this log?' 'How many product IDs have more than 100 transactions?' These are the real data quality questions that show up in interviews as 'design a data quality check for this dataset.' The Counter(Counter()) Pattern Data Quality Application: NULL Rate Distribution Finding Singletons in a Log Singleton detection — finding keys that appear exactly once — is useful for deduplication audits, ref