Collections: Advanced

Ansible, the infrastructure automation tool used by thousands of enterprise engineering teams, uses ChainMap to layer playbook variables over inventory variables over command-line overrides, so that the most specific setting always wins without any copy-and-merge logic. Kubernetes' Python client uses the same layering pattern to merge pod specs with namespace and cluster defaults into a single resolved configuration object. The advanced collection techniques in this lesson, including ChainMap and custom UserDict subclasses, are the patterns behind elegant configuration systems at companies running global infrastructure.

heapq Operations

Daily Life

Interviews

Retrieve top-priority items efficiently

A heap is a specialized tree-based data structure that satisfies the heap property: in a min-heap, each parent node is smaller than or equal to its children. This seemingly simple property has powerful implications. The smallest element is always at the root, giving O(1) access to the minimum value. Python's heapq module implements a min-heap using a regular list as the underlying storage, providing efficient priority queue operations without requiring a separate data structure.

The key insight is that heappush() and heappop() are O(log n) operations, while finding the minimum is O(1). This makes heaps ideal for scenarios where you repeatedly need the smallest element from a dynamic collection. Compare this to keeping a sorted list where insertion would be O(n), or an unsorted list where finding the minimum would be O(n).

The heap property is maintained implicitly through the array representation. For an element at index i, its left child is at index 2i+1 and its right child is at 2i+2. When you push or pop elements, the heap operations restore the heap property by "bubbling up" or "bubbling down" elements as needed.

Creating and Using Heaps

You can transform any existing list into a heap using heapify, then push and pop elements while the heap automatically maintains the heap property. The heapify operation is remarkably efficient at O(n), faster than the O(n log n) you might expect from inserting n elements one by one.

	import heapq

	# Create a heap from an existing list
	tasks = [5, 3, 8, 1, 9, 2]
	# O(n) transformation
	heapq.heapify(tasks)
	print("Heapified:", tasks)
	# O(1) peek
	print("Smallest:", tasks[0])

	# Push new items - heap property maintained
	heapq.heappush(tasks, 0)
	print("After push 0:", tasks)

	# Pop smallest items one by one
	first = heapq.heappop(tasks)
	second = heapq.heappop(tasks)
	print("Popped:", first, second)
	print("Remaining:", tasks)

>>>Output

Heapified: [1, 3, 2, 5, 9, 8]

Smallest: 1

After push 0: [0, 3, 1, 5, 9, 8, 2]

Popped: 0 1

Remaining: [2, 3, 8, 5, 9]

TIP

The heapified list may look unsorted, but it satisfies the heap property: each parent is smaller than its children. The minimum is always at index 0. Never rely on heap order beyond the root.

Finding N Smallest/Largest

The nsmallest() and nlargest() functions efficiently find the N smallest or largest items from any iterable. These functions are smarter than they might appear: they automatically choose the optimal algorithm based on N relative to the collection size. For small N, they use a heap. For N close to the total size, they sort instead.

	import heapq

	scores = [85, 92, 78, 95, 88, 76, 91, 83, 97, 80]

	# Get top 3 scores efficiently
	top_3 = heapq.nlargest(3, scores)
	print("Top 3:", top_3)

	# Get bottom 3 scores
	bottom_3 = heapq.nsmallest(3, scores)
	print("Bottom 3:", bottom_3)

	# Works with key function for complex objects
	students = [
	{"name": "Alice", "gpa": 3.8},
	{"name": "Bob", "gpa": 3.2},
	{"name": "Carol", "gpa": 3.9},
	{"name": "David", "gpa": 3.5},
	]

	top_students = heapq.nlargest(2, students, key=lambda s: s["gpa"])
	for s in top_students:
	print(f"{s['name']}: {s['gpa']}")

>>>Output

Top 3: [97, 95, 92]

Bottom 3: [76, 78, 80]

Carol: 3.9

Alice: 3.8

TIP

Use nsmallest/nlargest when N is much smaller than the list size. For N close to the list size, sorted() with slicing is more efficient. For N=1, just use min() or max(). The heapq module documentation explicitly recommends this guidance.

Priority Queue Pattern

The most common use of heaps is implementing priority queues, where items are processed not in insertion order but by priority. A powerful pattern is using tuples where the first element is the priority value. Python compares tuples element-by-element, so the smallest priority comes out first. This pattern is used extensively in task scheduling, event processing, and graph algorithms.

	import heapq

	# Task queue: (priority, task_name)
	# Lower number = higher priority
	task_queue = []

	heapq.heappush(task_queue, (2, "Write tests"))
	heapq.heappush(task_queue, (1, "Fix critical bug"))
	heapq.heappush(task_queue, (3, "Update docs"))
	heapq.heappush(task_queue, (1, "Deploy hotfix"))

	print("Processing tasks by priority:")
	while task_queue:
	priority, task = heapq.heappop(task_queue)
	print(f" [{priority}] {task}")

>>>Output

Processing tasks by priority:

  [1] Deploy hotfix

  [1] Fix critical bug

  [2] Write tests

  [3] Update docs

Notice that tasks with the same priority are processed in insertion order within that priority level. This is because Python's tuple comparison falls back to comparing subsequent elements when the first elements are equal. If the second elements are not comparable, you can add a sequence number as a tiebreaker.

Max Heap Implementation

Python's heapq only provides a min-heap, where the smallest element is at the root. For a max-heap where you want quick access to the largest element, use the negation trick: negate values when pushing and negate again when popping. This effectively inverts the comparison order.

	import heapq

	# Simulate max-heap by negating values
	max_heap = []
	values = [5, 3, 8, 1, 9]

	for v in values:
	# Negate on push
	heapq.heappush(max_heap, -v)

	print("Pop in descending order:")
	while max_heap:
	# Negate on pop
	original_value = -heapq.heappop(max_heap)
	print(original_value)

>>>Output

Pop in descending order:

9

8

5

3

1

Merging Sorted Streams

The heapq.merge() function efficiently merges multiple sorted iterables into a single sorted iterator. This is invaluable when processing multiple sorted log files or combining results from parallel processing:

	import heapq

	# Three sorted streams (e.g., from different log files)
	stream1 = [1, 4, 7, 10]
	stream2 = [2, 5, 8, 11]
	stream3 = [3, 6, 9, 12]

	# Merge into single sorted stream
	merged = heapq.merge(stream1, stream2, stream3)
	print("Merged:", list(merged))

	# Works with timestamps for log merging
	logs_a = [("10:01", "User A login"), ("10:05", "User A action")]
	logs_b = [("10:02", "User B login"), ("10:04", "User B action")]

	merged_logs = heapq.merge(logs_a, logs_b)
	for timestamp, event in merged_logs:
	print(f" {timestamp}: {event}")

>>>Output

Merged: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

  10:01: User A login

  10:02: User B login

  10:04: User B action

  10:05: User A action

Top/bottom N elements

Efficiently find the N largest or smallest items in a large dataset

Priority job queues

Schedule tasks by urgency so the highest-priority item is always next

Merge sorted streams

Combine multiple sorted log files into one sorted output

Graph algorithms

Power Dijkstra shortest path and similar priority-based searches

Real-time top-K

Track the K largest values in streaming data without sorting

Heaps are so fundamental that Python itself uses them behind the scenes.

Counter for Frequency

Daily Life

Interviews

Count occurrences and find top items

The Counter class from the collections module is a specialized dictionary subclass designed specifically for counting hashable objects. While you could count items using a regular dictionary with a loop, Counter provides convenient methods for frequency analysis that would otherwise require manual implementation. It's one of the most commonly used tools in data analysis and text processing.

Counter inherits from dict, so all dictionary methods work on it. However, Counter adds specialized functionality: it accepts iterables in its constructor, returns zero for missing keys instead of raising KeyError, and provides methods for finding the most common elements and performing arithmetic on frequency distributions.

Creating Counters

Counter can be created from any iterable, automatically counting the occurrences of each element. It can also be created from keyword arguments or another mapping. The flexibility in construction makes it easy to use in many different contexts.

	from collections import Counter

	# Count characters in a string
	char_counts = Counter("mississippi")
	print("Character counts:", char_counts)

	# Count items in a list
	colors = ["red", "blue", "red", "green", "blue", "red"]
	color_counts = Counter(colors)
	print("Color counts:", color_counts)

	# Count words in text
	text = "the quick brown fox jumps over the lazy dog"
	word_counts = Counter(text.split())
	print("Word counts:", word_counts)

	# Create from keyword arguments
	inventory = Counter(apples=5, oranges=3, bananas=2)
	print("Inventory:", inventory)

>>>Output

Character counts: Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})

Color counts: Counter({'red': 3, 'blue': 2, 'green': 1})

Word counts: Counter({'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1})

Inventory: Counter({'apples': 5, 'oranges': 3, 'bananas': 2})

The typical Counter workflow follows a predictable pattern that you will use repeatedly in data analysis tasks.

Create Counter

Pass any iterable to Counter() to count occurrences automatically

Query counts

Access individual counts by key or get zero for missing items

Find top items

Use most_common(n) to get the n most frequent elements

Combine data

Use arithmetic operators to merge or compare frequency data

The most_common() Method

The most_common(n) method returns the n most frequent elements and their counts as a list of tuples, sorted by frequency in descending order. This is incredibly useful for finding top trends, common errors, or frequent patterns in data. Without this method, you would need to sort the items yourself.

	from collections import Counter

	# Analyze log levels from application logs
	log_levels = [
	"INFO", "DEBUG", "INFO", "ERROR", "INFO", "WARNING",
	"DEBUG", "INFO", "ERROR", "INFO", "DEBUG", "INFO",
	"INFO", "DEBUG", "WARNING", "INFO", "ERROR"
	]
	counts = Counter(log_levels)

	# Get the 2 most common log levels
	print("Top 2 log levels:")
	for level, count in counts.most_common(2):
	print(f" {level}: {count}")

	# Get all levels sorted by frequency
	print("\nAll levels by frequency:")
	for level, count in counts.most_common():
	print(f" {level}: {count}")

	# Get least common (reverse the list)
	print("\nLeast common:", counts.most_common()[-1])

>>>Output

Top 2 log levels:

  INFO: 8

  DEBUG: 4

All levels by frequency:

  INFO: 8

  DEBUG: 4

  ERROR: 3

  WARNING: 2

Least common: ('WARNING', 2)

Counter Arithmetic

One of Counter's most powerful features is support for arithmetic operations. You can add, subtract, and find intersections or unions of frequency distributions. This makes it easy to combine data from multiple sources or compute differences between datasets.

	from collections import Counter

	# Sales from two stores
	store_a = Counter({"apples": 10, "oranges": 5, "bananas": 8})
	store_b = Counter({"apples": 6, "oranges": 12, "grapes": 4})

	# Combined sales (addition)
	total = store_a + store_b
	print("Combined:", total)

	# Difference (negative counts removed)
	diff = store_a - store_b
	print("Store A surplus:", diff)

	# Intersection (minimum of each)
	common = store_a & store_b
	print("Common minimum:", common)

	# Union (maximum of each)
	combined_max = store_a \| store_b
	print("Combined maximum:", combined_max)

>>>Output

Combined: Counter({'oranges': 17, 'apples': 16, 'bananas': 8, 'grapes': 4})

Store A surplus: Counter({'bananas': 8, 'apples': 4})

Common minimum: Counter({'apples': 6, 'oranges': 5})

Combined maximum: Counter({'oranges': 12, 'apples': 10, 'bananas': 8, 'grapes': 4})

TIP

Counter subtraction with the minus operator only keeps positive counts, dropping zeros and negatives. Use the subtract() method if you need to track negative values, such as when tracking inventory deficits or debts.

Updating Counters

Counters can be updated incrementally using the update() method, which adds counts from another iterable or mapping. The subtract() method does the opposite, reducing counts:

	from collections import Counter

	# Track website visits
	visits = Counter()

	# Morning batch of visits
	visits.update(["home", "products", "home", "cart", "checkout"])
	print("After morning:", visits)

	# Afternoon batch
	visits.update(["home", "about", "products", "products"])
	print("After afternoon:", visits)

	# Remove some visits (maybe spam filtered out)
	visits.subtract(["home", "home"])
	print("After filtering:", visits)

	# Access individual counts
	print("Home visits:", visits["home"])
	# Returns 0, not KeyError
	print("Missing page:", visits["missing"])

>>>Output

After morning: Counter({'home': 2, 'products': 1, 'cart': 1, 'checkout': 1})

After afternoon: Counter({'home': 3, 'products': 3, 'cart': 1, 'checkout': 1, 'about': 1})

After filtering: Counter({'products': 3, 'home': 1, 'cart': 1, 'checkout': 1, 'about': 1})

Home visits: 1

Missing page: 0

Practical Applications

Counter excels at data analysis tasks that appear constantly in real-world applications: finding duplicates, computing histograms, validating anagrams, and analyzing distributions. These patterns appear in log analysis, text processing, data validation, and many other domains.

	from collections import Counter

	# Check if two strings are anagrams
	def are_anagrams(s1, s2):
	# Remove spaces and compare character frequencies
	return Counter(s1.lower().replace(" ", "")) == Counter(s2.lower().replace(" ", ""))

	print("listen/silent:", are_anagrams("listen", "silent"))
	print("hello/world:", are_anagrams("hello", "world"))
	print("dormitory/dirty room:", are_anagrams("dormitory", "dirty room"))

	# Find elements appearing more than once
	data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
	counts = Counter(data)
	duplicates = [item for item, count in counts.items() if count > 1]
	print("\nDuplicates:", duplicates)

	# Find elements appearing exactly once
	unique = [item for item, count in counts.items() if count == 1]
	print("Unique:", unique)

>>>Output

listen/silent: True

hello/world: False

dormitory/dirty room: True

Duplicates: [2, 3, 4]

Unique: [1]

	from collections import Counter

	# Create a text histogram
	ages = [25, 30, 25, 35, 30, 25, 40, 30, 25, 28, 32, 25]
	age_dist = Counter(ages)

	print("Age distribution:")
	for age in sorted(age_dist.keys()):
	bar = "" age_dist[age]
	print(f" {age}: {bar} ({age_dist[age]})")

	# Total elements
	print("\nTotal people:", sum(age_dist.values()))

	# Unique ages
	print("Unique ages:", len(age_dist))

>>>Output

Age distribution:

  25: ***** (5)

  28: * (1)

  30: *** (3)

  32: * (1)

  35: * (1)

  40: * (1)

Total people: 12

Unique ages: 6

Counter simplifies counting dramatically compared to doing it manually.

•Manual Counting

Requires explicit initialization
KeyError on missing keys
No built-in most_common()
Manual arithmetic logic

•Counter

Counts during construction
Returns 0 for missing keys
Built-in frequency sorting
Arithmetic operators included

Python Quiz

> After counting item frequencies, compute the total number of items across all categories. Choose the aggregation function and the Counter accessor that returns the counts.

from collections import Counter
data = [3, 1, 2, 3, 2, 3, 3]
freq = Counter(data)
print(freq[3])
print(___(freq.___()))

sum

len

max

values

keys

Counter's most_common() method returns elements sorted by frequency in descending order, making it easy to find the top N items without manual sorting.

When combining Counters with arithmetic operators, remember that subtraction only keeps positive counts, while the subtract() method preserves zero and negative values for tracking deficits.

Counter objects behave like regular dictionaries for most operations, but return zero instead of raising KeyError for missing keys, which makes frequency lookups safe without explicit existence checks.

Fill in the Blank

> You counted color frequencies with Counter and want to find the single most popular color. Pick the method that returns the top element.

from collections import Counter
colors = ["red", "blue", "red", "green", "red", "blue"]
c = Counter(colors)
print(c.(1))

Counter is a subclass of dict, so it supports all standard dictionary operations in addition to its specialized frequency analysis methods.

Using most_common() without an argument returns all elements sorted by frequency, which is equivalent to sorting items() by count in descending order.

Counter is particularly powerful when combined with other collections tools: you can count items, find the top N, and then use the results to filter or transform your original data.

defaultdict Usage

Daily Life

Interviews

Group data without key-check boilerplate

A defaultdict is a dictionary subclass that automatically creates missing keys with a default value. This simple change eliminates one of the most common patterns in Python code: checking if a key exists before accessing or modifying it. The result is cleaner, more readable code that's also less prone to bugs.

The key difference from a regular dictionary is the behavior when accessing a key that doesn't exist. A regular dict raises KeyError, while defaultdict calls a factory function you provide to create a default value, stores it, and returns it. This factory function takes no arguments and returns the default value.

The Problem It Solves

Without defaultdict, grouping operations require verbose key-existence checks. This pattern is so common that it became tedious boilerplate in many codebases. Consider this common task of grouping employees by department:

	users = [
	("engineering", "Alice"),
	("marketing", "Bob"),
	("engineering", "Carol"),
	("sales", "David"),
	("marketing", "Eve"),
	]

	# Manual approach - requires if check
	groups = {}
	for dept, name in users:
	if dept not in groups:
	groups[dept] = []
	groups[dept].append(name)

	print("Manual grouping:", groups)

	# Or using setdefault - still clunky
	groups2 = {}
	for dept, name in users:
	groups2.setdefault(dept, []).append(name)

	print("With setdefault:", groups2)

>>>Output

Manual grouping: {'engineering': ['Alice', 'Carol'], 'marketing': ['Bob', 'Eve'], 'sales': ['David']}

With setdefault: {'engineering': ['Alice', 'Carol'], 'marketing': ['Bob', 'Eve'], 'sales': ['David']}

Both approaches work, but they're verbose and the conditional logic obscures the intent. With large codebases, these patterns multiply and become maintenance burdens. The setdefault approach is slightly better but still requires you to think about initialization on every access.

The defaultdict Solution

With defaultdict, missing keys are automatically initialized. The argument is a factory function that creates the default value. When you access a missing key, defaultdict calls this function, stores the result, and returns it. The code becomes much cleaner:

	from collections import defaultdict

	users = [
	("engineering", "Alice"),
	("marketing", "Bob"),
	("engineering", "Carol"),
	("sales", "David"),
	("marketing", "Eve"),
	]

	# Missing keys auto-initialize to empty list
	groups = defaultdict(list)
	for dept, name in users:
	groups[dept].append(name)

	print("Grouped:", dict(groups))

	# Accessing a missing key creates it
	print("Finance team:", groups["finance"])
	print("After access:", dict(groups))

>>>Output

Grouped: {'engineering': ['Alice', 'Carol'], 'marketing': ['Bob', 'Eve'], 'sales': ['David']}

Finance team: []

After access: {'engineering': ['Alice', 'Carol'], 'marketing': ['Bob', 'Eve'], 'sales': ['David'], 'finance': []}

TIP

The argument to defaultdict must be a callable that takes no arguments, not a value. When you write defaultdict(list), you pass the list class itself, which returns an empty list when called. This is why list works but [] would not.

Common Factory Functions

Different factory functions serve different purposes. The most common are list for grouping, int for counting, and set for tracking unique values per key. You can also use lambda functions for custom default values.

	from collections import defaultdict

	word_counts = defaultdict(int)
	for word in "the quick brown fox jumps over the quick fox".split():
	word_counts[word] += 1
	print("Counts:", dict(word_counts))

	user_tags = defaultdict(set)
	user_tags["alice"].add("python")
	user_tags["alice"].add("data")
	user_tags["alice"].add("python")
	user_tags["bob"].add("javascript")
	print("Tags:", dict(user_tags))

	settings = defaultdict(lambda: "N/A")
	settings["theme"] = "dark"
	print("Theme:", settings["theme"])
	print("Language:", settings["language"])

>>>Output

Counts: {'the': 2, 'quick': 2, 'brown': 1, 'fox': 2, 'jumps': 1, 'over': 1}

Tags: {'alice': {'python', 'data'}, 'bob': {'javascript'}}

Theme: dark

Language: N/A

listintsetdictlambda

list

Grouping items

Empty list [] per new key

int

Counting keys

Starts at zero for each key

set

Unique per key

Empty set for deduplication

dict

Nested mapping

Nested dict for hierarchies

lambda

Custom default

Any value via lambda: val

Nested defaultdicts

For multi-level grouping or hierarchical data, you can nest defaultdicts using lambda functions. This is powerful for building complex data structures dynamically without worrying about initialization at any level.

	from collections import defaultdict

	# Two-level nesting: region -> product -> sales count
	sales = defaultdict(lambda: defaultdict(int))

	# Add sales data - no initialization needed at any level!
	sales["east"]["widgets"] += 100
	sales["east"]["gadgets"] += 50
	sales["west"]["widgets"] += 75
	sales["west"]["gizmos"] += 200
	sales["east"]["widgets"] += 25

	print("East widgets:", sales["east"]["widgets"])
	print("West region:", dict(sales["west"]))
	print("Full data:", {k: dict(v) for k, v in sales.items()})

>>>Output

East widgets: 125

West region: {'widgets': 75, 'gizmos': 200}

Full data: {'east': {'widgets': 125, 'gadgets': 50}, 'west': {'widgets': 75, 'gizmos': 200}}

	from collections import defaultdict

	# Three-level: year -> month -> category -> amount
	financials = defaultdict(lambda: defaultdict(lambda: defaultdict(float)))

	financials[2024]["Jan"]["revenue"] = 50000.0
	financials[2024]["Jan"]["expenses"] = 30000.0
	financials[2024]["Feb"]["revenue"] = 55000.0
	financials[2024]["Feb"]["expenses"] = 32000.0

	print("2024 Jan:", dict(financials[2024]["Jan"]))
	print("2024 Feb revenue:", financials[2024]["Feb"]["revenue"])

	# Calculate totals
	jan_profit = financials[2024]["Jan"]["revenue"] - financials[2024]["Jan"]["expenses"]
	print("Jan profit:", jan_profit)

>>>Output

2024 Jan: {'revenue': 50000.0, 'expenses': 30000.0}

2024 Feb revenue: 55000.0

Jan profit: 20000.0

•Regular dict

KeyError on missing key
Explicit initialization needed
Safer - no accidental keys
Better for fixed schemas

•defaultdict

Auto-creates missing keys
Cleaner grouping/counting code
Watch for typos creating keys
Best for dynamic aggregation

The auto-creation behavior of defaultdict is powerful, but it comes with an important caveat.

TIP

Be careful with defaultdict when you actually want KeyError for missing keys. Accessing a missing key creates it, which can hide bugs from typos. Use regular dict when the set of valid keys is known in advance.

deque Double-Ended Operations

Daily Life

Interviews

Build fast queues and sliding windows

A deque (double-ended queue, pronounced "deck") provides O(1) append and pop operations from both ends. Regular Python lists are O(n) for operations at the front because all subsequent elements must be shifted. This performance difference is critical when building queues, implementing breadth-first search, or maintaining sliding windows over data streams.

The name "deque" comes from "double-ended queue" because it efficiently supports both FIFO (First-In, First-Out) queue operations and LIFO (Last-In, First-Out) stack operations. It's implemented as a doubly-linked list of fixed-size blocks, which provides the O(1) operations at both ends while maintaining reasonable memory efficiency.

List Performance Problems

Python lists are implemented as dynamic arrays, optimized for operations at the end. When you insert or remove at index 0, every other element must be shifted, making these operations O(n). For a queue where you add at one end and remove from the other, this becomes a significant bottleneck.

	# List performance issue demonstration
	# list.insert(0, x) is O(n) - shifts all elements right
	# list.pop(0) is O(n) - shifts all elements left

	# For a queue (FIFO), this matters greatly
	queue_list = []
	for i in range(5):
	# O(1) - efficient
	queue_list.append(i)

	print("Queue (list):", queue_list)

	# But removing from front is slow
	# O(n) - every element shifts!
	first = queue_list.pop(0)
	print("After pop(0):", queue_list)
	print("Removed:", first)

	# For 10,000 items, pop(0) moves 9,999 elements
	# For 1,000,000 items, this becomes very slow

>>>Output

Queue (list): [0, 1, 2, 3, 4]

After pop(0): [1, 2, 3, 4]

Removed: 0

The deque Operations

deque provides O(1) operations at both ends. The methods are symmetrical: append/appendleft for adding, pop/popleft for removing, and extend/extendleft for adding multiple items.

	from collections import deque

	# Create a deque
	d = deque([1, 2, 3])
	print("Initial:", d)

	# O(1) operations at both ends
	d.append(4)
	d.appendleft(0)
	print("After appends:", d)

	d.pop()
	d.popleft()
	print("After pops:", d)

	# Extend from either end
	d.extend([4, 5])
	d.extendleft([-1, -2])
	print("After extends:", d)

>>>Output

Initial: deque([1, 2, 3])

After appends: deque([0, 1, 2, 3, 4])

After pops: deque([1, 2, 3])

After extends: deque([-2, -1, 1, 2, 3, 4, 5])

TIP

Note that extendleft() adds elements in reverse order! If you extendleft([1, 2, 3]), the deque will have [3, 2, 1, ...] at the front because each element is added to the left one by one.

Queue and Stack with deque

deque is the standard way to implement both queues and stacks in Python. For a queue, use append to enqueue and popleft to dequeue. For a stack, use append to push and pop to pop. The flexibility to efficiently operate on both ends makes deque versatile.

	from collections import deque

	# FIFO Queue: add to back, remove from front
	queue = deque(["Task 1", "Task 2", "Task 3"])
	print("Queue:", list(queue))

	while queue:
	print(f"Processing: {queue.popleft()}")

	# LIFO Stack: add and remove from same end
	stack = deque(["Frame 1", "Frame 2", "Frame 3"])
	print("\nStack unwind:")
	while stack:
	print(f" Returning from: {stack.pop()}")

>>>Output

Queue: ['Task 1', 'Task 2', 'Task 3']

Processing: Task 1

Processing: Task 2

Processing: Task 3

Stack unwind:

  Returning from: Frame 3

  Returning from: Frame 2

  Returning from: Frame 1

Bounded deque with maxlen

A particularly useful feature is creating a bounded deque with maxlen. When a bounded deque is full, adding to one end automatically removes an element from the opposite end. This is perfect for keeping recent history, implementing rate limiters, or maintaining sliding windows.

	from collections import deque

	# Bounded deque - keeps last N items automatically
	recent_logs = deque(maxlen=3)
	recent_logs.append("Log 1")
	recent_logs.append("Log 2")
	recent_logs.append("Log 3")
	print("Full:", list(recent_logs))

	recent_logs.append("Log 4")
	print("After adding Log 4:", list(recent_logs))

	recent_logs.append("Log 5")
	print("After adding Log 5:", list(recent_logs))

	# Great for keeping last N events, commands, or samples
	command_history = deque(maxlen=5)
	for cmd in ["ls", "cd", "pwd", "mkdir", "rm", "cp", "mv"]:
	command_history.append(cmd)
	print("\nRecent commands:", list(command_history))

>>>Output

Full: ['Log 1', 'Log 2', 'Log 3']

After adding Log 4: ['Log 2', 'Log 3', 'Log 4']

After adding Log 5: ['Log 3', 'Log 4', 'Log 5']

Recent commands: ['mkdir', 'rm', 'cp', 'mv']

Rotation

The rotate() method efficiently rotates elements. Positive values rotate right (elements move toward higher indices, wrapping around), negative values rotate left. This is useful for circular buffer operations and round-robin scheduling.

	from collections import deque

	d = deque([1, 2, 3, 4, 5])
	print("Original:", d)

	# Rotate right: last 2 move to front
	d.rotate(2)
	print("Rotate right 2:", d)

	d.rotate(-3)
	print("Rotate left 3:", d)

	# Useful for round-robin scheduling
	players = deque(["Alice", "Bob", "Carol"])
	for round_num in range(1, 4):
	print(f"Round {round_num}: {players[0]}'s turn")
	players.rotate(-1)

>>>Output

Original: deque([1, 2, 3, 4, 5])

Rotate right 2: deque([4, 5, 1, 2, 3])

Rotate left 3: deque([2, 3, 4, 5, 1])

Round 1: Alice's turn

Round 2: Bob's turn

Round 3: Carol's turn

Sliding Window Pattern

Bounded deques are perfect for sliding window calculations. The maxlen parameter automatically maintains the window size, and the O(1) append makes it efficient for streaming data.

	from collections import deque

	def moving_average(values, window_size):
	window = deque(maxlen=window_size)
	averages = []
	for value in values:
	window.append(value)
	if len(window) == window_size:
	average = sum(window) / window_size
	averages.append(round(average, 2))
	return averages

	prices = [100, 102, 104, 103, 105, 107, 106, 108]
	print("Prices:", prices)
	print("3-period MA:", moving_average(prices, 3))

>>>Output

Prices: [100, 102, 104, 103, 105, 107, 106, 108]

3-period MA: [102.0, 103.0, 104.0, 105.0, 106.0, 107.0]

FIFO message queues

Process messages in order with O(1) enqueue and dequeue operations

Bounded log buffers

Keep only the most recent N entries using maxlen auto-eviction

Sliding windows

Compute moving averages and rate limits over a fixed window size

BFS graph traversal

Breadth-first search using O(1) popleft instead of slow list.pop(0)

Undo/redo history

Maintain bounded operation history with automatic old entry discard

The reason deque achieves O(1) at both ends comes from its internal implementation.

Debug Challenge

> This queue drains a list using pop(0), which shifts every remaining element on each call. The code works but runs in O(n^2) time.

Performance issue: list.pop(0) is O(n) per call, making this O(n^2) overall

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99




queue = [1, 2, 3, 4, 5]
while queue:
  item = queue.pop(0)
  print(item)
queue = [1, 2, 3, 4, 5]
while queue:
  item = queue.pop(0)
  print(item)

deque is the standard data structure for queue operations in Python. Its O(1) performance at both ends makes it dramatically faster than using a list with pop(0) for large datasets.

The maxlen parameter turns a deque into a fixed-size circular buffer, automatically discarding the oldest element whenever a new one is added. This makes bounded deques ideal for sliding window computations and log buffers.

Unlike lists, deque does not support efficient random access by index. For workloads that need both fast front/back operations and random indexing, consider other data structures like indexed trees.

bisect Binary Search

Daily Life

Interviews

Search and insert in sorted lists fast

The bisect module provides binary search functions for maintaining sorted lists. It finds insertion points in O(log n) time, making it efficient for scenarios where you repeatedly insert into and search sorted data. The module is named after the bisection algorithm, which repeatedly divides the search space in half.

While you could maintain a sorted list by calling sort() after each insertion, that would be O(n log n) per insertion. Using bisect to find the insertion point is O(log n), and the subsequent list insertion is O(n), making the overall operation faster for large lists. For scenarios requiring frequent sorted insertions and searches, bisect is invaluable.

Finding Insertion Points

bisect_left() and bisect_right() (or equivalently bisect()) find the index where an element should be inserted to maintain sorted order. The difference matters when the value already exists in the list:

	import bisect

	sorted_list = [10, 20, 30, 40, 50]

	# Find insertion point for new value
	pos = bisect.bisect_left(sorted_list, 25)
	print(f"Insert 25 at index: {pos}")

	# For existing values, left vs right matters
	pos_left = bisect.bisect_left(sorted_list, 30)
	pos_right = bisect.bisect_right(sorted_list, 30)
	print(f"bisect_left(30): {pos_left}")
	print(f"bisect_right(30): {pos_right}")

	# Edge cases: smaller/larger than all elements
	print(f"Insert 5 at: {bisect.bisect_left(sorted_list, 5)}")
	print(f"Insert 100 at: {bisect.bisect_left(sorted_list, 100)}")

>>>Output

Insert 25 at index: 2

bisect_left(30): 2

bisect_right(30): 3

Insert 5 at: 0

Insert 100 at: 5

bisect_leftbisect_rightnew value

bisect_left

Insert before

Goes before any duplicates

bisect_right

Insert after

Goes after all duplicates

new value

Same position

Both agree when no match

Insert in Sorted Order

insort_left() and insort_right() combine finding the position and inserting in one operation. These are convenience functions equivalent to finding the position with bisect and then calling list.insert():

	import bisect

	# Maintain a sorted leaderboard
	scores = [72, 85, 91, 95]
	print("Initial:", scores)

	# Insert new scores in sorted position
	bisect.insort(scores, 88)
	print("After insort 88:", scores)

	bisect.insort(scores, 95)
	print("After insort 95:", scores)

	bisect.insort_left(scores, 91)
	print("After insort_left 91:", scores)

	# Build sorted list from unsorted data
	data = [5, 2, 8, 1, 9, 3]
	sorted_data = []
	for item in data:
	bisect.insort(sorted_data, item)
	print("Built sorted:", sorted_data)

>>>Output

Initial: [72, 85, 91, 95]

After insort 88: [72, 85, 88, 91, 95]

After insort 95: [72, 85, 88, 91, 95, 95]

After insort_left 91: [72, 85, 88, 91, 91, 95, 95]

Built sorted: [1, 2, 3, 5, 8, 9]

Binary Search: Exact Match

The bisect module finds insertion points, not exact matches. To check if a value exists, use bisect_left() and verify the element at that position:

	import bisect

	def binary_search(sorted_list, target):
	"""Return index of target if found, else -1"""
	idx = bisect.bisect_left(sorted_list, target)
	if idx < len(sorted_list) and sorted_list[idx] == target:
	return idx
	return -1

	numbers = [10, 20, 30, 40, 50, 60, 70]
	print("Search 30:", binary_search(numbers, 30))
	print("Search 35:", binary_search(numbers, 35))
	print("Search 10:", binary_search(numbers, 10))
	print("Search 80:", binary_search(numbers, 80))

	# More efficient than 'in' operator for sorted lists
	# 'x in list' is O(n), binary_search is O(log n)

>>>Output

Search 30: 2

Search 35: -1

Search 10: 0

Search 80: -1

Grade Classification

A classic and elegant bisect application is mapping numeric values to categories using breakpoints. This pattern is cleaner than a chain of if-elif statements and scales to any number of categories:

	import bisect

	def get_grade(score):
	"""Convert numeric score to letter grade"""
	# F < 60, D < 70, C < 80, B < 90, A >= 90
	breakpoints = [60, 70, 80, 90]
	grades = "FDCBA"
	idx = bisect.bisect(breakpoints, score)
	return grades[idx]

	# Test various scores
	test_scores = [55, 65, 75, 85, 95, 60, 90, 100, 0]
	for score in test_scores:
	grade = get_grade(score)
	print(f"{score:3d}: {grade}")

>>>Output

 55: F

 65: D

 75: C

 85: B

 95: A

 60: D

 90: A

100: A

  0: F

Range Queries

bisect enables efficient range queries on sorted data. By finding the insertion points for the low and high bounds, you can count or retrieve all elements within a range in O(log n) time for the search, plus O(k) for the k elements in the range.

	import bisect

	def get_in_range(sorted_list, low, high):
	left_index = bisect.bisect_left(sorted_list, low)
	right_index = bisect.bisect_right(sorted_list, high)
	return sorted_list[left_index:right_index]

	prices = [10, 15, 20, 25, 30, 35, 40, 45, 50]

	items = get_in_range(prices, 20, 40)
	print(f"Items in [20, 40]: {items} (count: {len(items)})")

	above_30 = prices[bisect.bisect_right(prices, 30):]
	print(f"Items > 30: {above_30}")

>>>Output

Items in [20, 40]: [20, 25, 30, 35, 40] (count: 5)

Items > 30: [35, 40, 45, 50]

•Linear Search

O(n) - check every element
Works on unsorted data
Simple to implement
Slow for large datasets

•bisect Search

O(log n) - halve search space
Requires sorted data
Ideal for sorted lists
Fast for any dataset size

TIP

Remember that bisect only provides O(log n) for finding the position. The actual list insertion with insort is still O(n) because list.insert() must shift elements. For workloads with many insertions, consider a different data structure like a balanced tree.

Python Quiz

> Insert a value into a sorted list while keeping it sorted, then find the index where that value now sits. Choose the bisect function for each step.

import bisect
data = [10, 20, 40, 50]
bisect.___(data, 30)
pos = bisect.___(data, 30)
print(pos)

insort

append

insert

bisect_left

bisect_right

Common Mistakes

Even experienced developers make these mistakes with specialized collections. Understanding these pitfalls will help you avoid subtle bugs and use these tools correctly.

✓Do

Use deque for queue operations instead of list.pop(0)
Check "key in dict" before accessing defaultdict to avoid creating phantom keys
Sort your list before using bisect functions

✗Don't

Iterate over a heap expecting sorted order
Use bisect on unsorted data (it silently gives wrong results)
Forget that Counter subtraction drops zero and negative counts

Expecting Heaps Sorted

A heap is NOT a sorted list. The heap property only guarantees that the minimum is at index 0. Iterating over a heapified list does not give sorted order. To get sorted output, you must pop elements one by one:

	import heapq

	data = [5, 3, 8, 1, 9, 2, 7]
	heapq.heapify(data)

	# WRONG: This is NOT sorted!
	print("Heap (NOT sorted):", data)

	# CORRECT: Pop for sorted order
	heap_copy = data.copy()
	sorted_result = []
	while heap_copy:
	sorted_result.append(heapq.heappop(heap_copy))
	print("Sorted via heappop:", sorted_result)

>>>Output

Heap (NOT sorted): [1, 3, 2, 5, 9, 8, 7]

Sorted via heappop: [1, 2, 3, 5, 7, 8, 9]

defaultdict Side Effects

Accessing a missing key in defaultdict creates it. This is usually helpful, but it can cause unexpected keys when you're checking for existence or have typos in key names:

	from collections import defaultdict

	counts = defaultdict(int)
	counts["apples"] = 5

	# WRONG: This check creates the key with value 0!
	# Creates key, returns 0 (falsy)
	if counts["bananas"]:
	print("Has bananas")

	print("Keys after wrong check:", list(counts.keys()))

	# CORRECT: Use 'in' for existence checks
	counts2 = defaultdict(int)
	counts2["apples"] = 5

	if "bananas" in counts2:
	print("Has bananas")

	print("Keys (correct):", list(counts2.keys()))

>>>Output

Keys after wrong check: ['apples', 'bananas']

Keys (correct): ['apples']

bisect on Unsorted Data

bisect assumes the list is already sorted. Using it on unsorted data produces wrong results without any error or warning:

	import bisect

	# WRONG: unsorted list gives wrong results
	unsorted = [30, 10, 50, 20, 40]
	pos = bisect.bisect_left(unsorted, 25)
	print(f"Wrong position in unsorted: {pos}")

	sorted_list = sorted(unsorted)
	pos = bisect.bisect_left(sorted_list, 25)
	print(f"Correct position in sorted: {pos}")
	print(f"Sorted list: {sorted_list}")

>>>Output

Wrong position in unsorted: 1

Correct position in sorted: 2

Sorted list: [10, 20, 30, 40, 50]

list vs deque: When to Pick

Using list.pop(0) in a loop is a common performance mistake. For queue-like operations where you add to one end and remove from the other, always use deque:

	from collections import deque

	# pop(0) shifts all elements - O(n)
	# deque.popleft() is O(1) - no shifting

	# For 10000 items processed as queue:
	# list: O(n^2) total = 100 million shifts
	# deque: O(n) total = just 20000 operations

	q_list = [1, 2, 3]
	# Slow: shifts all elements left
	q_list.pop(0)
	print("List after pop(0):", q_list)

	q_deque = deque([1, 2, 3])
	# Fast: no shifting needed
	q_deque.popleft()
	print("Deque after popleft:", list(q_deque))

>>>Output

List after pop(0): [2, 3]

Deque after popleft: [2, 3]

Specialized collection types including heapq, Counter, defaultdict, deque, and bisect solve real-world performance and design problems that basic structures cannot. Put your skills to the test with hands-on challenges in the Python Builder.

❯❯❯PUTTING IT ALL TOGETHER

> You are a data engineer at Uber building a real-time ride-dispatch system that surfaces the nearest available driver, counts request types by city, groups drivers by zone without key errors, maintains a sliding window of recent GPS pings, and inserts arrival estimates into a sorted schedule.

heapq maintains a min-heap of driver distances so the nearest available driver is retrieved in O(log n) without sorting the full list.

Counter tallies ride-request types per city in one pass, exposing the most frequent request category with a single .most_common() call.

defaultdict with a list factory automatically initializes the per-zone driver groups, eliminating explicit key-existence checks on every insert.

deque enforces a fixed-length sliding window of recent GPS pings per driver, efficiently appending new positions and discarding stale ones from the left.

KEY TAKEAWAYS

heapq provides O(1) access to minimum and O(log n) push/pop for efficient priority queue operations

nlargest() and nsmallest() efficiently find top/bottom N elements, auto-selecting the optimal algorithm

Counter automates frequency counting with most_common(), arithmetic operations, and zero for missing keys

defaultdict eliminates key-existence checks by auto-initializing missing keys with a factory function

deque provides O(1) operations at both ends, essential for queues and sliding windows

Bounded deques with maxlen automatically maintain fixed-size buffers, discarding old elements

bisect enables O(log n) binary search on sorted lists for insertion points, range queries, and classification

Choose the right collection: heaps for priority, Counter for frequency, defaultdict for grouping, deque for queues, bisect for sorted data

These specialized containers often replace manual implementations with faster, cleaner, and more maintainable solutions

Always verify assumptions: heaps are not sorted, bisect requires sorted input, defaultdict creates keys on access

Collections: Advanced

heapq Operations

Creating and Using Heaps

Finding N Smallest/Largest

Priority Queue Pattern

Max Heap Implementation

Merging Sorted Streams

Counter for Frequency

Creating Counters

The most_common() Method

Counter Arithmetic

Updating Counters

Practical Applications

defaultdict Usage

The Problem It Solves

The defaultdict Solution

Common Factory Functions

Nested defaultdicts

deque Double-Ended Operations

List Performance Problems

The deque Operations

Queue and Stack with deque

Bounded deque with maxlen

Rotation

Sliding Window Pattern

bisect Binary Search

Finding Insertion Points

Insert in Sorted Order

Binary Search: Exact Match

Grade Classification

Range Queries

Common Mistakes

Expecting Heaps Sorted

defaultdict Side Effects

bisect on Unsorted Data

list vs deque: When to Pick

Lesson Sections