Loading lesson...
Prefix sums plus a hash map. The combo that makes O(n^2) problems look embarrassing.
Prefix sums plus a hash map. The combo that makes O(n^2) problems look embarrassing.
Topics covered: Prefix Sum + Hash Map: The O(n) Superpower, Prefix XOR: The Bitwise Sibling, Difference Arrays: The Inverse of Prefix Sums, Prefix Sums on Strings: Balanced Substrings, DE Applications: Prefix Sums in Spark and pandas
This is the combination that shows up in more mid-to-senior level interviews than almost any other pattern. The setup is always the same: you need to count (or find) subarrays satisfying some sum condition. Brute force checks every pair (i, j): O(n^2). The insight is that any subarray sum condition can be restated as a relationship between two prefix sum values. Once you have that relationship, you use a hash map to look up the complement in O(1). Total: O(n) time, O(n) space. The General Framew
Prefix XOR is the exact same pattern as prefix sum, but with XOR instead of addition. It exploits the properties of XOR -- specifically that a XOR a = 0 and a XOR 0 = a -- to answer range XOR queries in O(1) and count subarrays with specific XOR values. This is less commonly asked than prefix sum, but it appears in DE interviews at companies dealing with bitwise aggregations, data encoding, or checksum computation. Knowing it demonstrates pattern depth. Range XOR Query The formula xp[j+1] ^ xp[i
If prefix sums let you query ranges efficiently, difference arrays let you update ranges efficiently. The duality is perfect and deliberate. With a prefix sum array, you precompute cumulative sums and query any range in O(1). With a difference array, you apply range updates in O(1) and reconstruct the final array in O(n) with a prefix sum pass. They are inverses of each other. The difference array is underrated in interviews -- most candidates have never heard of it, which means knowing it immed
Prefix sums are not limited to numeric arrays. Strings are arrays of characters, and you can precompute character frequency prefix arrays -- one per character -- to answer 'how many of character X appear between index i and j' in O(1). More powerfully, you can encode binary string properties (is this character a vowel or consonant? 0 or 1?) as a numeric prefix sum and use the hash map combo to find balanced substrings in O(n). This unlocks a whole family of string problems that naive approaches
Here is the part that turns a good algorithm answer into an exceptional data engineering answer. Prefix sums are not academic constructs -- they are the engine under several production pipeline patterns. When you call cumsum() in pandas, Spark's window functions compute running totals, or you write a rolling aggregation over a sorted dataset, you are using prefix sums. Understanding this connection lets you reason about performance, explain trade-offs, and design better pipelines. It also gives