Problem Solving: Advanced

Amazon's advertising auction system uses a greedy algorithm to allocate billions of ad impressions per day, and Amazon's warehouse robots use dynamic programming to find optimal pick paths through facilities with thousands of possible routes. These are not theoretical exercises - dynamic programming and greedy algorithms power some of the highest-value software systems ever built, processing decisions worth trillions of dollars annually. The gap between a developer who knows these patterns and one who does not is the gap between writing O(n) solutions and O(2ⁿ) solutions that time out on real data. This lesson closes that gap.

Two-Pointer Technique

Daily Life

Interviews

Solve pair problems in one pass

The two-pointer technique uses two indices that move through an array in a coordinated way. It's powerful for problems involving pairs, palindromes, or partitioning.

Opposite Ends Pattern

Start one pointer at the beginning and one at the end. Move them toward each other based on some condition.

	def is_palindrome(s):
	left = 0
	right = len(s) - 1

	while left < right:
	if s[left] != s[right]:
	return False
	left += 1
	right -= 1

	return True

	print(is_palindrome("racecar"))
	print(is_palindrome("hello"))

>>>Output

True

False

Two Sum with Sorted Array

When an array is sorted, two pointers can find a pair that sums to a target in O(n) time instead of O(n²).

	def two_sum_sorted(numbers, target):
	left = 0
	right = len(numbers) - 1

	while left < right:
	current_sum = numbers[left] + numbers[right]

	if current_sum == target:
	return [left, right]
	elif current_sum < target:
	# Need bigger sum
	left += 1
	else:
	# Need smaller sum
	right -= 1

	return None

	nums = [1, 3, 5, 7, 9, 11]
	print(two_sum_sorted(nums, 12))

>>>Output

[2, 3]

Sum too small?

Left pointer moves right toward bigger values

Sum too big?

Right pointer moves left toward smaller values

Each step counts

Every move eliminates one possibility from the search

O(n) total

At most n steps since pointers only move inward

Same Direction Pattern

Both pointers start at the beginning. A "fast" pointer explores while a "slow" pointer marks a position.

	def remove_duplicates_sorted(nums):
	"""Remove duplicates in-place."""
	if not nums:
	return 0

	slow = 0

	for fast in range(1, len(nums)):
	if nums[fast] != nums[slow]:
	slow += 1
	nums[slow] = nums[fast]

	return slow + 1

	nums = [1, 1, 2, 2, 2, 3, 4, 4]
	length = remove_duplicates_sorted(nums)
	print(f"Unique elements: {nums[:length]}")

>>>Output

Unique elements: [1, 2, 3, 4]

Fill in the Blank

> A sorted list [1, 3, 5, 7, 9] needs to find a pair that sums to 8. Two pointers start at opposite ends. Pick how each pointer moves when the current sum is too small or too large.

nums = [1, 3, 5, 7, 9]
target = 8
left, right = 0, len(nums) - 1
while left < right:
    s = nums[left] + nums[right]
    if s == target:
        print(f"Found: {nums[left]} + {nums[right]}")
        break
    elif s < target:
        
    else:
        
else:
    print("No pair found")

The two-pointer technique achieves O(n) time by eliminating one element from the search space with every step. Either the left pointer advances or the right pointer retreats, so the total number of moves is at most n.

This pattern applies whenever sorted data lets you make a decision that rules out one side. Palindrome checking, three-sum problems, and container-with-most-water all follow the same logic.

TIP

Two-pointer problems are almost always on sorted input. If the problem does not mention sorting, check whether sorting first (O(n log n)) makes a two-pointer pass possible at O(n), giving a net O(n log n) solution instead of O(n²).

Hash Map O(1) Lookups

Daily Life

Interviews

Replace nested loops with hash maps

Hash maps (Python dictionaries) provide O(1) average-case lookup. This single property is the key to optimizing countless algorithms.

Classic Two Sum Problem

Given an array and a target, find two numbers that sum to the target. The brute force O(n²) solution checks every pair. The hash map solution is O(n).

	def two_sum(nums, target):
	seen = {}

	for i, num in enumerate(nums):
	complement = target - num

	if complement in seen:
	return [seen[complement], i]

	seen[num] = i

	return None

	nums = [2, 7, 11, 15]
	print(two_sum(nums, 9))

>>>Output

[0, 1]

The key insight: instead of searching for each complement with complement in nums (O(n)), we store seen values in a dictionary for O(1) lookup.

Value-to-Index Mapping

Store values as keys and their indices as values. This pattern solves many "find where" problems.

	def first_occurrence(items, target):
	"""Build lookup once, query many times."""
	index_of = {item: i for i, item in enumerate(items)}

	if target in index_of:
	return index_of[target]
	return -1

	fruits = ["apple", "banana", "cherry", "date"]
	print(first_occurrence(fruits, "cherry"))
	print(first_occurrence(fruits, "grape"))

>>>Output

2

-1

•Without Hash Map

Search each time: O(n)
m searches: O(m × n)
1000 searches in 1000 items: 1M operations

•With Hash Map

Build map once: O(n)
Each lookup: O(1)
1000 searches in 1000 items: 2000 operations

Choosing when to use a hash map versus a brute-force scan is one of the most impactful optimization decisions you can make. Build the lookup once and query many times, check for the complement before storing the current value, and use enumerate when you need both index and value.

TIP

Avoid nested loops to check all pairs. Instead, store seen values in a dict and look up complements in O(1). Never rebuild the lookup inside a loop, and always handle missing keys with .get() or in checks.

Debug Challenge

> This two-sum function stores each number in the hash map before checking for its complement, so a number can match itself when its double equals the target.

Returns [0, 0] because 3 matches itself (3 + 3 = 6) since it was added before checking

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99










def two_sum(nums, target):
  seen = {}
  for i, num in enumerate(nums):
    seen[num] = i
    complement = target - num
    if complement in seen:
      return [seen[complement], i]
  return None

print(two_sum([3, 2, 4], 6))
def two_sum(nums, target):
  seen = {}
  for i, num in enumerate(nums):
    seen[num] = i
    complement = target - num
    if complement in seen:
      return [seen[complement], i]
  return None

print(two_sum([3, 2, 4], 6))

The ordering of operations inside a loop matters enormously. Checking before storing ensures you only match pairs of distinct elements, not a single element matched against itself.

Hash map lookups run in O(1) average time. This single property lets you replace an O(n²) double loop with an O(n) single loop: process each element once, look up its complement in constant time.

TIP

Whenever you find yourself writing a nested loop to compare all pairs, ask: can I store values in a dict and look up what I need instead? Hash maps are the most common way to turn O(n²) into O(n).

Frequency Counting

Daily Life

Interviews

Count and rank items in one pass

Many problems become trivial once you count occurrences. Frequency maps answer questions like "how many of each?" and "what appears most/least?"

Building Frequency Maps

The basic pattern for counting involves iterating through items and incrementing a counter for each unique value.

	def count_frequencies(items):
	freq = {}
	for item in items:
	if item in freq:
	freq[item] += 1
	else:
	freq[item] = 1
	return freq

	# Shorter version using .get()
	def count_frequencies_v2(items):
	freq = {}
	for item in items:
	freq[item] = freq.get(item, 0) + 1
	return freq

	letters = "mississippi"
	print(count_frequencies(letters))

>>>Output

{'m': 1, 'i': 4, 's': 4, 'p': 2}

Find Most Common Element

Once you have frequencies, finding the most common element is straightforward.

	def most_common(items):
	freq = {}
	for item in items:
	freq[item] = freq.get(item, 0) + 1

	max_count = 0
	max_item = None

	for item, count in freq.items():
	if count > max_count:
	max_count = count
	max_item = item

	return max_item

	votes = ["alice", "bob", "alice", "alice", "bob"]
	print(f"Winner: {most_common(votes)}")

>>>Output

Winner: alice

Check Anagrams

Two strings are anagrams if they have exactly the same character frequencies.

	def are_anagrams(s1, s2):
	if len(s1) != len(s2):
	return False

	freq1 = {}
	freq2 = {}

	for c in s1:
	freq1[c] = freq1.get(c, 0) + 1
	for c in s2:
	freq2[c] = freq2.get(c, 0) + 1

	return freq1 == freq2

	print(are_anagrams("listen", "silent"))
	print(are_anagrams("hello", "world"))

>>>Output

True

False

Frequency maps unlock a surprising number of problems beyond simple counting. Once you build the habit of reaching for a dictionary whenever you see "how many" or "which is most common," many problems become almost trivial.

When to Reach for a Frequency Map

Finding the most or least common element in a sequence
Checking if two collections are anagrams or permutations of each other
Detecting duplicates without sorting the data first
Grouping items by category and counting each group in a single pass

In practice, Python provides a built-in shortcut for frequency maps.

TIP

Python's collections.Counter class does frequency counting automatically. In interviews, show you understand the concept first, then mention Counter as the Pythonic way.

Fill in the Blank

> The word "banana" needs to be analyzed to find which character appears most often. Pick the right dictionary method and aggregation function to build and query a frequency map.

word = "banana"
freq = {}
for ch in word:
    freq[ch] = freq.(ch, 0) + 1
most = (freq, key=freq.get)
print(f"{most}: {freq[most]}")

Frequency maps are one of the most versatile tools in a programmer's toolkit. Any time a problem asks about counts, rankings, or repetition, a dictionary of counts is likely the right starting structure.

Python's collections.Counter builds frequency maps in a single call and adds useful methods like most_common(). Use it when writing production code, but understand the manual version for interviews and learning.

TIP

Recognizing frequency-counting problems gets easier with practice. Key signals: "most common," "appears more than k times," "check if two strings are anagrams," or "find a majority element."

Prefix Sum Concept

Daily Life

Interviews

Answer range sum queries instantly

A prefix sum array stores running totals. Once built, you can calculate the sum of ANY subarray in O(1) time instead of O(n).

Building Prefix Sums

	def build_prefix_sum(nums):
	prefix = [0]
	total = 0

	for num in nums:
	total += num
	prefix.append(total)

	return prefix

	nums = [3, 1, 4, 1, 5]
	prefix = build_prefix_sum(nums)
	print(f"Original: {nums}")
	print(f"Prefix: {prefix}")

>>>Output

Original: [3, 1, 4, 1, 5]

Prefix:   [0, 3, 4, 8, 9, 14]

The prefix array has one more element than the original. Each prefix[i] is the sum of all elements before index i.

Range Sum Queries

To get the sum from index i to j (inclusive), compute prefix[j+1] - prefix[i].

	def range_sum(prefix, i, j):
	"""Sum from index i to j."""
	return prefix[j + 1] - prefix[i]

	nums = [3, 1, 4, 1, 5]
	prefix = [0, 3, 4, 8, 9, 14]

	# Sum from index 1 to 3 (1 + 4 + 1 = 6)
	print(f"Sum [1:3] = {range_sum(prefix, 1, 3)}")

	# Sum from index 0 to 4 (entire array = 14)
	print(f"Sum [0:4] = {range_sum(prefix, 0, 4)}")

	# Sum from index 2 to 2 (just element 4)
	print(f"Sum [2:2] = {range_sum(prefix, 2, 2)}")

>>>Output

Sum [1:3] = 6

Sum [0:4] = 14

Sum [2:2] = 4

Many range sum queries

When you need repeated sums over different slices of static data

Finding target subarrays

Locate subarrays whose elements sum to a specific target value

Running averages

Calculate the average of any window without re-summing each time

Repeated range operations

Any "sum between index i and j" that runs inside a loop

Debug Challenge

> This prefix sum range query uses prefix[j] - prefix[i], but the formula is off by one. It excludes the element at index j from the sum.

Returns 5 instead of 6 for sum of indices 1 to 3 (1 + 4 + 1 = 6)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99






def range_sum(prefix, i, j):
  return prefix[j] - prefix[i]

nums = [3, 1, 4, 1, 5]
prefix = [0, 3, 4, 8, 9, 14]
print(range_sum(prefix, 1, 3))
def range_sum(prefix, i, j):
  return prefix[j] - prefix[i]

nums = [3, 1, 4, 1, 5]
prefix = [0, 3, 4, 8, 9, 14]
print(range_sum(prefix, 1, 3))

The prefix array is one index longer than the original so that prefix[0] = 0 acts as a base. This design makes the range sum formula prefix[j+1] - prefix[i] work cleanly for all subarrays, including those starting at index 0.

Prefix sums excel when many range queries run on static data. Build the prefix array once in O(n), then answer each query in O(1). For m queries that is O(n + m) total, far better than O(n × m) with brute force.

TIP

Prefix sums also extend to 2D arrays for rectangle sum queries, and to prefix products or prefix XOR for other problems. The same idea of precomputing cumulative values applies broadly.

Sort as Preprocessing

Daily Life

Interviews

Use sorting to unlock fast algorithms

Sorting is O(n log n), but it unlocks many efficient algorithms. If your problem allows reordering, sorting is often the key to an elegant solution.

Find Duplicates

In a sorted array, duplicates are always adjacent. One pass finds them all.

	def find_duplicates(nums):
	sorted_nums = sorted(nums)
	duplicates = []

	for i in range(1, len(sorted_nums)):
	if sorted_nums[i] == sorted_nums[i - 1]:
	if not duplicates or duplicates[-1] != sorted_nums[i]:
	duplicates.append(sorted_nums[i])

	return duplicates

	nums = [4, 2, 7, 2, 1, 4, 9, 4]
	print(find_duplicates(nums))

>>>Output

[2, 4]

Merge Intervals

A classic problem: merge overlapping intervals. Sorting by start time makes overlaps easy to detect.

	def merge_intervals(intervals):
	if not intervals:
	return []

	# Sort by start time
	intervals.sort(key=lambda x: x[0])

	merged = [intervals[0]]

	for current in intervals[1:]:
	last = merged[-1]

	if current[0] <= last[1]:
	last[1] = max(last[1], current[1])
	else:
	merged.append(current)

	return merged

	intervals = [[1, 3], [8, 10], [2, 6], [15, 18]]
	print(merge_intervals(intervals))

>>>Output

[[1, 6], [8, 10], [15, 18]]

K Closest Points

Find the k smallest or largest elements by sorting, then taking a slice.

	def k_smallest(nums, k):
	"""Return k smallest elements."""
	sorted_nums = sorted(nums)
	return sorted_nums[:k]

	def k_largest(nums, k):
	"""Return k largest elements."""
	sorted_nums = sorted(nums, reverse=True)
	return sorted_nums[:k]

	nums = [7, 3, 9, 1, 5, 8, 2]
	print(f"3 smallest: {k_smallest(nums, 3)}")
	print(f"3 largest: {k_largest(nums, 3)}")

>>>Output

3 smallest: [1, 2, 3]

3 largest:  [9, 8, 7]

SEARCHPOINTERSDEDUPMERGESELECT

Binary search

O(log n) lookup on sorted

POINTERS

Two-pointer

Converging scans in O(n)

DEDUP

Adjacent dupes

Duplicates sit side by side

MERGE

Efficient merge

Combine sorted runs in O(n)

SELECT

K-th element

Slice first or last k items

TIP

When you see a problem that seems to require O(n²) comparisons, ask: "Would sorting help?" The O(n log n) sort cost is usually worth it.

Fill in the Blank

> Given the list [10, 3, 7, 4, 20], you need to check whether any two elements are within 1 of each other. Pick a preprocessing step and a comparison operator to solve it efficiently.

def has_close_pair(nums):
    
    for i in range(len(nums) - 1):
        if abs(nums[i] - nums[i + 1])  1:
            return True
    return False

print(has_close_pair([10, 3, 7, 4, 20]))

Sorting as preprocessing is a powerful general strategy. The O(n log n) cost of sorting is often dominated by the O(n) or O(n²) work you save afterward. When an unsorted problem seems to require checking all pairs, sort first and look only at adjacent elements.

The algorithms covered in this lesson all follow the same principle: choose a data structure or preprocessing step that reduces the number of comparisons you need to make at each stage.

Algorithm Pattern Quick Reference

Two pointers: sorted array, use opposing or same-direction pointers for O(n)
Hash map: store seen values for O(1) complement or membership lookup
Frequency count: dict.get(k, 0) + 1 to tally any sequence of items
Prefix sum: precompute running totals for O(1) range sum queries
Sort first: bring related elements adjacent before a single linear scan

The Deduplication EngineStep 1

You are the lead engineer at a health-tech startup that aggregates patient records from 30 clinics. Each clinic submits CSV files nightly, but naming conventions differ: "Robert Smith" vs "Rob Smith" vs "R. Smith" at different addresses. The analytics team reports that 18% of patient IDs appear to be duplicates, inflating treatment counts and skewing billing. The CEO wants a deduplication system ready within two months.

patient_records

record_id	name	dob	address	clinic_id
R001	Robert Smith	1985-03-15	123 Oak St	C01
R002	Rob Smith	1985-03-15	123 Oak Street	C02
R003	R. Smith	1985-03-15	123 Oak St Apt 2	C03

Jul 2026

Detection Strategy

With 200,000 records from 30 clinics, how should the system identify which records refer to the same patient?

Advanced problem-solving strategies prepare you for the kind of complex, multi-step challenges found in real-world software development. Put your skills to the test with hands-on challenges in the Python Builder.

The deduplication scenario above illustrates a recurring theme: real systems combine multiple algorithmic techniques. Hash-based grouping (O(n)) reduces the comparison space before fuzzy matching, which is the same idea behind choosing the right data structure first.

TIP

When approaching a system design problem, identify the bottleneck operation and apply the appropriate algorithmic pattern: hash map for lookups, sorting for proximity, prefix sums for ranges, and two pointers for converging scans.

❯❯❯PUTTING IT ALL TOGETHER

> You are a senior data engineer at Databricks optimizing a Python pipeline that deduplicates billions of user events per day. Applying the right algorithmic patterns reduces the nightly job from 6 hours to 40 minutes without increasing cluster cost.

The two-pointer technique eliminates a nested loop in the sorted deduplication step, cutting time complexity from O(n²) to O(n) on billion-row event partitions.

Hash map O(1) lookups replace repeated linear scans when checking whether an event ID has already been seen, the core of the deduplication logic.

Frequency counting identifies which event types appear suspiciously often, flagging data quality issues before they corrupt downstream aggregation tables.

Sort as preprocessing reorders events by user_id and timestamp once at O(n log n) so all subsequent grouping passes run in linear time.

KEY TAKEAWAYS

Two pointers: opposite ends or same direction patterns

Hash maps turn O(n) lookups into O(1)

Frequency counting solves "how many" and anagram problems

Prefix sums enable O(1) range sum queries

Sorting (O(n log n)) often enables O(n) algorithms

Recognize these patterns to optimize brute force solutions

Problem Solving: Advanced

Two-Pointer Technique

Opposite Ends Pattern

Two Sum with Sorted Array

Same Direction Pattern

Hash Map O(1) Lookups

Classic Two Sum Problem

Value-to-Index Mapping

Frequency Counting

Building Frequency Maps

Find Most Common Element

Check Anagrams

Prefix Sum Concept

Building Prefix Sums

Range Sum Queries

Sort as Preprocessing

Find Duplicates

Merge Intervals

K Closest Points

Lesson Sections