Loading lesson...
When sorting is too slow, buckets are too naïve, and the window is moving.
When sorting is too slow, buckets are too naïve, and the window is moving.
Topics covered: Bucket Sort: O(n) Top-K That Interviewers Love, First Unique Character: When Order Matters, Frequency Counting + Sliding Window: The Power Combo, Frequency of Frequencies for Data Quality, When Counter Fails: Scale, Cardinality, and Streaming
Here is the thing about bucket sort for top-K: the interviewer is not testing whether you memorized the algorithm. They are testing whether you can look at a constraint (frequencies are bounded by n) and exploit it to break through the O(n log n) sorting barrier. Candidates who find this insight on their own, in real time, score significantly higher than candidates who present the O(n log n) solution and stop. The bucket sort idea is elegant, explainable in 30 seconds, and shows genuine algorith
First non-repeating character (LeetCode 387) looks like a trivial frequency problem, but it has an interesting wrinkle: you need the FIRST unique character, not just any unique character. This means order matters. And order mattering in a dict context is where candidates who do not know Python's memory model make mistakes. There are two clean approaches: two-pass with a plain dict, and one-pass with an OrderedDict. Knowing both and the tradeoff between them shows the interviewer you think about
When frequency counting meets sliding windows, the result is one of the most elegant patterns in coding interviews. The canonical problem: 'Find All Anagrams in a String' (LeetCode 438). You have a string s and a pattern p. Find all starting indices in s where a substring of length len(p) is an anagram of p. The naive approach: for every window of size len(p), build a Counter and compare. That is O(n * m) where m = len(p). The efficient approach maintains a Counter of the current window and upda
Frequency of frequencies is a two-level counting pattern: first count how often each element appears, then count how often each frequency appears. The result is a histogram of counts. Counter(Counter(data).values()) is one of the most powerful one-liners for data quality analysis. It answers questions like: 'How many columns have exactly zero NULLs?' 'How many keys appear exactly once (singletons) in this log?' 'How many product IDs have more than 100 transactions?' These are the real data quali
Counter is a hashtable. Hashtables need to hold every distinct key in memory. For a dataset with 100 distinct error codes, Counter is perfect. For a dataset with 10 billion distinct IP addresses, Counter will exhaust your machine's RAM before you finish reading the first partition. Knowing when Counter fails, and what to use instead, is the move that separates data engineers who can talk about algorithms from data engineers who have actually run those algorithms on real data. This is the questio