Collections: Advanced

Ansible, the infrastructure automation tool used by thousands of enterprise engineering teams, uses ChainMap to layer playbook variables over inventory variables over command-line overrides, so that the most specific setting always wins without any copy-and-merge logic. Kubernetes' Python client uses the same layering pattern to merge pod specs with namespace and cluster defaults into a single resolved configuration object. The advanced collection techniques in this lesson, including ChainMap and custom UserDict subclasses, are the patterns behind elegant configuration systems at companies running global infrastructure.

heapq Operations

Daily Life
Interviews

Retrieve top-priority items efficiently

A heap is a specialized tree-based data structure that satisfies the heap property: in a min-heap, each parent node is smaller than or equal to its children. This seemingly simple property has powerful implications. The smallest element is always at the root, giving O(1) access to the minimum value. Python's heapq module implements a min-heap using a regular list as the underlying storage, providing efficient priority queue operations without requiring a separate data structure.

The key insight is that heappush() and heappop() are O(log n) operations, while finding the minimum is O(1). This makes heaps ideal for scenarios where you repeatedly need the smallest element from a dynamic collection. Compare this to keeping a sorted list where insertion would be O(n), or an unsorted list where finding the minimum would be O(n).

The heap property is maintained implicitly through the array representation. For an element at index i, its left child is at index 2i+1 and its right child is at 2i+2. When you push or pop elements, the heap operations restore the heap property by "bubbling up" or "bubbling down" elements as needed.

Creating and Using Heaps

You can transform any existing list into a heap using heapify, then push and pop elements while the heap automatically maintains the heap property. The heapify operation is remarkably efficient at O(n), faster than the O(n log n) you might expect from inserting n elements one by one.

1import heapq
2
3# Create a heap from an existing list
4tasks = [5, 3, 8, 1, 9, 2]
5# O(n) transformation
6heapq.heapify(tasks)
7print("Heapified:", tasks)
8# O(1) peek
9print("Smallest:", tasks[0])
10
11# Push new items - heap property maintained
12heapq.heappush(tasks, 0)
13print("After push 0:", tasks)
14
15# Pop smallest items one by one
16first = heapq.heappop(tasks)
17second = heapq.heappop(tasks)
18print("Popped:", first, second)
19print("Remaining:", tasks)
>>>Output
Heapified: [1, 3, 2, 5, 9, 8]
Smallest: 1
After push 0: [0, 3, 1, 5, 9, 8, 2]
Popped: 0 1
Remaining: [2, 3, 8, 5, 9]
TIP
The heapified list may look unsorted, but it satisfies the heap property: each parent is smaller than its children. The minimum is always at index 0. Never rely on heap order beyond the root.

Finding N Smallest/Largest

The nsmallest() and nlargest() functions efficiently find the N smallest or largest items from any iterable. These functions are smarter than they might appear: they automatically choose the optimal algorithm based on N relative to the collection size. For small N, they use a heap. For N close to the total size, they sort instead.

1import heapq
2
3scores = [85, 92, 78, 95, 88, 76, 91, 83, 97, 80]
4
5# Get top 3 scores efficiently
6top_3 = heapq.nlargest(3, scores)
7print("Top 3:", top_3)
8
9# Get bottom 3 scores
10bottom_3 = heapq.nsmallest(3, scores)
11print("Bottom 3:", bottom_3)
12
13# Works with key function for complex objects
14students = [
15 {"name": "Alice", "gpa": 3.8},
16 {"name": "Bob", "gpa": 3.2},
17 {"name": "Carol", "gpa": 3.9},
18 {"name": "David", "gpa": 3.5},
19]
20
21top_students = heapq.nlargest(2, students, key=lambda s: s["gpa"])
22for s in top_students:
23 print(f"{s['name']}: {s['gpa']}")
>>>Output
Top 3: [97, 95, 92]
Bottom 3: [76, 78, 80]
Carol: 3.9
Alice: 3.8
TIP
Use nsmallest/nlargest when N is much smaller than the list size. For N close to the list size, sorted() with slicing is more efficient. For N=1, just use min() or max(). The heapq module documentation explicitly recommends this guidance.

Priority Queue Pattern

The most common use of heaps is implementing priority queues, where items are processed not in insertion order but by priority. A powerful pattern is using tuples where the first element is the priority value. Python compares tuples element-by-element, so the smallest priority comes out first. This pattern is used extensively in task scheduling, event processing, and graph algorithms.

1import heapq
2
3# Task queue: (priority, task_name)
4# Lower number = higher priority
5task_queue = []
6
7heapq.heappush(task_queue, (2, "Write tests"))
8heapq.heappush(task_queue, (1, "Fix critical bug"))
9heapq.heappush(task_queue, (3, "Update docs"))
10heapq.heappush(task_queue, (1, "Deploy hotfix"))
11
12print("Processing tasks by priority:")
13while task_queue:
14 priority, task = heapq.heappop(task_queue)
15 print(f" [{priority}] {task}")
>>>Output
Processing tasks by priority:
[1] Deploy hotfix
[1] Fix critical bug
[2] Write tests
[3] Update docs
Notice that tasks with the same priority are processed in insertion order within that priority level. This is because Python's tuple comparison falls back to comparing subsequent elements when the first elements are equal. If the second elements are not comparable, you can add a sequence number as a tiebreaker.

Max Heap Implementation

Python's heapq only provides a min-heap, where the smallest element is at the root. For a max-heap where you want quick access to the largest element, use the negation trick: negate values when pushing and negate again when popping. This effectively inverts the comparison order.

1import heapq
2
3# Simulate max-heap by negating values
4max_heap = []
5values = [5, 3, 8, 1, 9]
6
7for v in values:
8 # Negate on push
9 heapq.heappush(max_heap, -v)
10
11print("Pop in descending order:")
12while max_heap:
13 # Negate on pop
14 original_value = -heapq.heappop(max_heap)
15 print(original_value)
>>>Output
Pop in descending order:
9
8
5
3
1

Merging Sorted Streams

The heapq.merge() function efficiently merges multiple sorted iterables into a single sorted iterator. This is invaluable when processing multiple sorted log files or combining results from parallel processing:

1import heapq
2
3# Three sorted streams (e.g., from different log files)
4stream1 = [1, 4, 7, 10]
5stream2 = [2, 5, 8, 11]
6stream3 = [3, 6, 9, 12]
7
8# Merge into single sorted stream
9merged = heapq.merge(stream1, stream2, stream3)
10print("Merged:", list(merged))
11
12# Works with timestamps for log merging
13logs_a = [("10:01", "User A login"), ("10:05", "User A action")]
14logs_b = [("10:02", "User B login"), ("10:04", "User B action")]
15
16merged_logs = heapq.merge(logs_a, logs_b)
17for timestamp, event in merged_logs:
18 print(f" {timestamp}: {event}")
>>>Output
Merged: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
10:01: User A login
10:02: User B login
10:04: User B action
10:05: User A action
Top/bottom N elements
Top/bottom N elements
Efficiently find the N largest or smallest items in a large dataset
Priority job queues
Priority job queues
Schedule tasks by urgency so the highest-priority item is always next
Merge sorted streams
Merge sorted streams
Combine multiple sorted log files into one sorted output
Graph algorithms
Graph algorithms
Power Dijkstra shortest path and similar priority-based searches
Real-time top-K
Real-time top-K
Track the K largest values in streaming data without sorting
Heaps are so fundamental that Python itself uses them behind the scenes.

Counter for Frequency

Daily Life
Interviews

Count occurrences and find top items

The Counter class from the collections module is a specialized dictionary subclass designed specifically for counting hashable objects. While you could count items using a regular dictionary with a loop, Counter provides convenient methods for frequency analysis that would otherwise require manual implementation. It's one of the most commonly used tools in data analysis and text processing.

Counter inherits from dict, so all dictionary methods work on it. However, Counter adds specialized functionality: it accepts iterables in its constructor, returns zero for missing keys instead of raising KeyError, and provides methods for finding the most common elements and performing arithmetic on frequency distributions.

Creating Counters

Counter can be created from any iterable, automatically counting the occurrences of each element. It can also be created from keyword arguments or another mapping. The flexibility in construction makes it easy to use in many different contexts.

1from collections import Counter
2
3# Count characters in a string
4char_counts = Counter("mississippi")
5print("Character counts:", char_counts)
6
7# Count items in a list
8colors = ["red", "blue", "red", "green", "blue", "red"]
9color_counts = Counter(colors)
10print("Color counts:", color_counts)
11
12# Count words in text
13text = "the quick brown fox jumps over the lazy dog"
14word_counts = Counter(text.split())
15print("Word counts:", word_counts)
16
17# Create from keyword arguments
18inventory = Counter(apples=5, oranges=3, bananas=2)
19print("Inventory:", inventory)
>>>Output
Character counts: Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})
Color counts: Counter({'red': 3, 'blue': 2, 'green': 1})
Word counts: Counter({'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1})
Inventory: Counter({'apples': 5, 'oranges': 3, 'bananas': 2})

The typical Counter workflow follows a predictable pattern that you will use repeatedly in data analysis tasks.

01
Create Counter
Pass any iterable to Counter() to count occurrences automatically
02
Query counts
Access individual counts by key or get zero for missing items
03
Find top items
Use most_common(n) to get the n most frequent elements
04
Combine data
Use arithmetic operators to merge or compare frequency data

The most_common() Method

The most_common(n) method returns the n most frequent elements and their counts as a list of tuples, sorted by frequency in descending order. This is incredibly useful for finding top trends, common errors, or frequent patterns in data. Without this method, you would need to sort the items yourself.

1from collections import Counter
2
3# Analyze log levels from application logs
4log_levels = [
5 "INFO", "DEBUG", "INFO", "ERROR", "INFO", "WARNING",
6 "DEBUG", "INFO", "ERROR", "INFO", "DEBUG", "INFO",
7 "INFO", "DEBUG", "WARNING", "INFO", "ERROR"
8]
9counts = Counter(log_levels)
10
11# Get the 2 most common log levels
12print("Top 2 log levels:")
13for level, count in counts.most_common(2):
14 print(f" {level}: {count}")
15
16# Get all levels sorted by frequency
17print("\nAll levels by frequency:")
18for level, count in counts.most_common():
19 print(f" {level}: {count}")
20
21# Get least common (reverse the list)
22print("\nLeast common:", counts.most_common()[-1])
>>>Output
Top 2 log levels:
INFO: 8
DEBUG: 4
 
All levels by frequency:
INFO: 8
DEBUG: 4
ERROR: 3
WARNING: 2
 
Least common: ('WARNING', 2)

Counter Arithmetic

One of Counter's most powerful features is support for arithmetic operations. You can add, subtract, and find intersections or unions of frequency distributions. This makes it easy to combine data from multiple sources or compute differences between datasets.

1from collections import Counter
2
3# Sales from two stores
4store_a = Counter({"apples": 10, "oranges": 5, "bananas": 8})
5store_b = Counter({"apples": 6, "oranges": 12, "grapes": 4})
6
7# Combined sales (addition)
8total = store_a + store_b
9print("Combined:", total)
10
11# Difference (negative counts removed)
12diff = store_a - store_b
13print("Store A surplus:", diff)
14
15# Intersection (minimum of each)
16common = store_a & store_b
17print("Common minimum:", common)
18
19# Union (maximum of each)
20combined_max = store_a | store_b
21print("Combined maximum:", combined_max)
>>>Output
Combined: Counter({'oranges': 17, 'apples': 16, 'bananas': 8, 'grapes': 4})
Store A surplus: Counter({'bananas': 8, 'apples': 4})
Common minimum: Counter({'apples': 6, 'oranges': 5})
Combined maximum: Counter({'oranges': 12, 'apples': 10, 'bananas': 8, 'grapes': 4})
TIP
Counter subtraction with the minus operator only keeps positive counts, dropping zeros and negatives. Use the subtract() method if you need to track negative values, such as when tracking inventory deficits or debts.

Updating Counters

Counters can be updated incrementally using the update() method, which adds counts from another iterable or mapping. The subtract() method does the opposite, reducing counts:

1from collections import Counter
2
3# Track website visits
4visits = Counter()
5
6# Morning batch of visits
7visits.update(["home", "products", "home", "cart", "checkout"])
8print("After morning:", visits)
9
10# Afternoon batch
11visits.update(["home", "about", "products", "products"])
12print("After afternoon:", visits)
13
14# Remove some visits (maybe spam filtered out)
15visits.subtract(["home", "home"])
16print("After filtering:", visits)
17
18# Access individual counts
19print("Home visits:", visits["home"])
20# Returns 0, not KeyError
21print("Missing page:", visits["missing"])
>>>Output
After morning: Counter({'home': 2, 'products': 1, 'cart': 1, 'checkout': 1})
After afternoon: Counter({'home': 3, 'products': 3, 'cart': 1, 'checkout': 1, 'about': 1})
After filtering: Counter({'products': 3, 'home': 1, 'cart': 1, 'checkout': 1, 'about': 1})
Home visits: 1
Missing page: 0

Practical Applications

Counter excels at data analysis tasks that appear constantly in real-world applications: finding duplicates, computing histograms, validating anagrams, and analyzing distributions. These patterns appear in log analysis, text processing, data validation, and many other domains.

1from collections import Counter
2
3# Check if two strings are anagrams
4def are_anagrams(s1, s2):
5 # Remove spaces and compare character frequencies
6 return Counter(s1.lower().replace(" ", "")) == Counter(s2.lower().replace(" ", ""))
7
8print("listen/silent:", are_anagrams("listen", "silent"))
9print("hello/world:", are_anagrams("hello", "world"))
10print("dormitory/dirty room:", are_anagrams("dormitory", "dirty room"))
11
12# Find elements appearing more than once
13data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
14counts = Counter(data)
15duplicates = [item for item, count in counts.items() if count > 1]
16print("\nDuplicates:", duplicates)
17
18# Find elements appearing exactly once
19unique = [item for item, count in counts.items() if count == 1]
20print("Unique:", unique)
>>>Output
listen/silent: True
hello/world: False
dormitory/dirty room: True
 
Duplicates: [2, 3, 4]
Unique: [1]
1from collections import Counter
2
3# Create a text histogram
4ages = [25, 30, 25, 35, 30, 25, 40, 30, 25, 28, 32, 25]
5age_dist = Counter(ages)
6
7print("Age distribution:")
8for age in sorted(age_dist.keys()):
9 bar = "*" * age_dist[age]
10 print(f" {age}: {bar} ({age_dist[age]})")
11
12# Total elements
13print("\nTotal people:", sum(age_dist.values()))
14
15# Unique ages
16print("Unique ages:", len(age_dist))
>>>Output
Age distribution:
25: ***** (5)
28: * (1)
30: *** (3)
32: * (1)
35: * (1)
40: * (1)
 
Total people: 12
Unique ages: 6

Counter simplifies counting dramatically compared to doing it manually.

Manual Counting
  • Requires explicit initialization
  • KeyError on missing keys
  • No built-in most_common()
  • Manual arithmetic logic
Counter
  • Counts during construction
  • Returns 0 for missing keys
  • Built-in frequency sorting
  • Arithmetic operators included
Python Quiz

> After counting item frequencies, compute the total number of items across all categories. Choose the aggregation function and the Counter accessor that returns the counts.

from collections import Counter
data = [3, 1, 2, 3, 2, 3, 3]
freq = Counter(data)
print(freq[3])
print(___(freq.___()))
values
keys
sum
max
len

Counter's most_common() method returns elements sorted by frequency in descending order, making it easy to find the top N items without manual sorting.

When combining Counters with arithmetic operators, remember that subtraction only keeps positive counts, while the subtract() method preserves zero and negative values for tracking deficits.

Counter objects behave like regular dictionaries for most operations, but return zero instead of raising KeyError for missing keys, which makes frequency lookups safe without explicit existence checks.

Fill in the Blank

> You counted color frequencies with Counter and want to find the single most popular color. Pick the method that returns the top element.

from collections import Counter
colors = ["red", "blue", "red", "green", "red", "blue"]
c = Counter(colors)
print(c.(1))

Counter is a subclass of dict, so it supports all standard dictionary operations in addition to its specialized frequency analysis methods.

Using most_common() without an argument returns all elements sorted by frequency, which is equivalent to sorting items() by count in descending order.

Counter is particularly powerful when combined with other collections tools: you can count items, find the top N, and then use the results to filter or transform your original data.

defaultdict Usage

Daily Life
Interviews

Group data without key-check boilerplate

A defaultdict is a dictionary subclass that automatically creates missing keys with a default value. This simple change eliminates one of the most common patterns in Python code: checking if a key exists before accessing or modifying it. The result is cleaner, more readable code that's also less prone to bugs.

The key difference from a regular dictionary is the behavior when accessing a key that doesn't exist. A regular dict raises KeyError, while defaultdict calls a factory function you provide to create a default value, stores it, and returns it. This factory function takes no arguments and returns the default value.

The Problem It Solves

Without defaultdict, grouping operations require verbose key-existence checks. This pattern is so common that it became tedious boilerplate in many codebases. Consider this common task of grouping employees by department:
1users = [
2 ("engineering", "Alice"),
3 ("marketing", "Bob"),
4 ("engineering", "Carol"),
5 ("sales", "David"),
6 ("marketing", "Eve"),
7]
8
9# Manual approach - requires if check
10groups = {}
11for dept, name in users:
12 if dept not in groups:
13 groups[dept] = []
14 groups[dept].append(name)
15
16print("Manual grouping:", groups)
17
18# Or using setdefault - still clunky
19groups2 = {}
20for dept, name in users:
21 groups2.setdefault(dept, []).append(name)
22
23print("With setdefault:", groups2)
>>>Output
Manual grouping: {'engineering': ['Alice', 'Carol'], 'marketing': ['Bob', 'Eve'], 'sales': ['David']}
With setdefault: {'engineering': ['Alice', 'Carol'], 'marketing': ['Bob', 'Eve'], 'sales': ['David']}

Both approaches work, but they're verbose and the conditional logic obscures the intent. With large codebases, these patterns multiply and become maintenance burdens. The setdefault approach is slightly better but still requires you to think about initialization on every access.

The defaultdict Solution

With defaultdict, missing keys are automatically initialized. The argument is a factory function that creates the default value. When you access a missing key, defaultdict calls this function, stores the result, and returns it. The code becomes much cleaner:

1from collections import defaultdict
2
3users = [
4 ("engineering", "Alice"),
5 ("marketing", "Bob"),
6 ("engineering", "Carol"),
7 ("sales", "David"),
8 ("marketing", "Eve"),
9]
10
11# Missing keys auto-initialize to empty list
12groups = defaultdict(list)
13for dept, name in users:
14 groups[dept].append(name)
15
16print("Grouped:", dict(groups))
17
18# Accessing a missing key creates it
19print("Finance team:", groups["finance"])
20print("After access:", dict(groups))
>>>Output
Grouped: {'engineering': ['Alice', 'Carol'], 'marketing': ['Bob', 'Eve'], 'sales': ['David']}
Finance team: []
After access: {'engineering': ['Alice', 'Carol'], 'marketing': ['Bob', 'Eve'], 'sales': ['David'], 'finance': []}
TIP
The argument to defaultdict must be a callable that takes no arguments, not a value. When you write defaultdict(list), you pass the list class itself, which returns an empty list when called. This is why list works but [] would not.

Common Factory Functions

Different factory functions serve different purposes. The most common are list for grouping, int for counting, and set for tracking unique values per key. You can also use lambda functions for custom default values.

1from collections import defaultdict
2
3word_counts = defaultdict(int)
4for word in "the quick brown fox jumps over the quick fox".split():
5 word_counts[word] += 1
6print("Counts:", dict(word_counts))
7
8user_tags = defaultdict(set)
9user_tags["alice"].add("python")
10user_tags["alice"].add("data")
11user_tags["alice"].add("python")
12user_tags["bob"].add("javascript")
13print("Tags:", dict(user_tags))
14
15settings = defaultdict(lambda: "N/A")
16settings["theme"] = "dark"
17print("Theme:", settings["theme"])
18print("Language:", settings["language"])
>>>Output
Counts: {'the': 2, 'quick': 2, 'brown': 1, 'fox': 2, 'jumps': 1, 'over': 1}
Tags: {'alice': {'python', 'data'}, 'bob': {'javascript'}}
Theme: dark
Language: N/A
listintsetdictlambda
list
Grouping items
Empty list [] per new key
int
Counting keys
Starts at zero for each key
set
Unique per key
Empty set for deduplication
dict
Nested mapping
Nested dict for hierarchies
lambda
Custom default
Any value via lambda: val

Nested defaultdicts

For multi-level grouping or hierarchical data, you can nest defaultdicts using lambda functions. This is powerful for building complex data structures dynamically without worrying about initialization at any level.

1from collections import defaultdict
2
3# Two-level nesting: region -> product -> sales count
4sales = defaultdict(lambda: defaultdict(int))
5
6# Add sales data - no initialization needed at any level!
7sales["east"]["widgets"] += 100
8sales["east"]["gadgets"] += 50
9sales["west"]["widgets"] += 75
10sales["west"]["gizmos"] += 200
11sales["east"]["widgets"] += 25
12
13print("East widgets:", sales["east"]["widgets"])
14print("West region:", dict(sales["west"]))
15print("Full data:", {k: dict(v) for k, v in sales.items()})
>>>Output
East widgets: 125
West region: {'widgets': 75, 'gizmos': 200}
Full data: {'east': {'widgets': 125, 'gadgets': 50}, 'west': {'widgets': 75, 'gizmos': 200}}
1from collections import defaultdict
2
3# Three-level: year -> month -> category -> amount
4financials = defaultdict(lambda: defaultdict(lambda: defaultdict(float)))
5
6financials[2024]["Jan"]["revenue"] = 50000.0
7financials[2024]["Jan"]["expenses"] = 30000.0
8financials[2024]["Feb"]["revenue"] = 55000.0
9financials[2024]["Feb"]["expenses"] = 32000.0
10
11print("2024 Jan:", dict(financials[2024]["Jan"]))
12print("2024 Feb revenue:", financials[2024]["Feb"]["revenue"])
13
14# Calculate totals
15jan_profit = financials[2024]["Jan"]["revenue"] - financials[2024]["Jan"]["expenses"]
16print("Jan profit:", jan_profit)
>>>Output
2024 Jan: {'revenue': 50000.0, 'expenses': 30000.0}
2024 Feb revenue: 55000.0
Jan profit: 20000.0
Regular dict
  • KeyError on missing key
  • Explicit initialization needed
  • Safer - no accidental keys
  • Better for fixed schemas
defaultdict
  • Auto-creates missing keys
  • Cleaner grouping/counting code
  • Watch for typos creating keys
  • Best for dynamic aggregation

The auto-creation behavior of defaultdict is powerful, but it comes with an important caveat.

TIP
Be careful with defaultdict when you actually want KeyError for missing keys. Accessing a missing key creates it, which can hide bugs from typos. Use regular dict when the set of valid keys is known in advance.

deque Double-Ended Operations

Daily Life
Interviews

Build fast queues and sliding windows

A deque (double-ended queue, pronounced "deck") provides O(1) append and pop operations from both ends. Regular Python lists are O(n) for operations at the front because all subsequent elements must be shifted. This performance difference is critical when building queues, implementing breadth-first search, or maintaining sliding windows over data streams.

The name "deque" comes from "double-ended queue" because it efficiently supports both FIFO (First-In, First-Out) queue operations and LIFO (Last-In, First-Out) stack operations. It's implemented as a doubly-linked list of fixed-size blocks, which provides the O(1) operations at both ends while maintaining reasonable memory efficiency.

List Performance Problems

Python lists are implemented as dynamic arrays, optimized for operations at the end. When you insert or remove at index 0, every other element must be shifted, making these operations O(n). For a queue where you add at one end and remove from the other, this becomes a significant bottleneck.

1# List performance issue demonstration
2# list.insert(0, x) is O(n) - shifts all elements right
3# list.pop(0) is O(n) - shifts all elements left
4
5# For a queue (FIFO), this matters greatly
6queue_list = []
7for i in range(5):
8 # O(1) - efficient
9 queue_list.append(i)
10
11print("Queue (list):", queue_list)
12
13# But removing from front is slow
14# O(n) - every element shifts!
15first = queue_list.pop(0)
16print("After pop(0):", queue_list)
17print("Removed:", first)
18
19# For 10,000 items, pop(0) moves 9,999 elements
20# For 1,000,000 items, this becomes very slow
>>>Output
Queue (list): [0, 1, 2, 3, 4]
After pop(0): [1, 2, 3, 4]
Removed: 0

The deque Operations

deque provides O(1) operations at both ends. The methods are symmetrical: append/appendleft for adding, pop/popleft for removing, and extend/extendleft for adding multiple items.

1from collections import deque
2
3# Create a deque
4d = deque([1, 2, 3])
5print("Initial:", d)
6
7# O(1) operations at both ends
8d.append(4)
9d.appendleft(0)
10print("After appends:", d)
11
12d.pop()
13d.popleft()
14print("After pops:", d)
15
16# Extend from either end
17d.extend([4, 5])
18d.extendleft([-1, -2])
19print("After extends:", d)
>>>Output
Initial: deque([1, 2, 3])
After appends: deque([0, 1, 2, 3, 4])
After pops: deque([1, 2, 3])
After extends: deque([-2, -1, 1, 2, 3, 4, 5])
TIP
Note that extendleft() adds elements in reverse order! If you extendleft([1, 2, 3]), the deque will have [3, 2, 1, ...] at the front because each element is added to the left one by one.

Queue and Stack with deque

deque is the standard way to implement both queues and stacks in Python. For a queue, use append to enqueue and popleft to dequeue. For a stack, use append to push and pop to pop. The flexibility to efficiently operate on both ends makes deque versatile.

1from collections import deque
2
3# FIFO Queue: add to back, remove from front
4queue = deque(["Task 1", "Task 2", "Task 3"])
5print("Queue:", list(queue))
6
7while queue:
8 print(f"Processing: {queue.popleft()}")
9
10# LIFO Stack: add and remove from same end
11stack = deque(["Frame 1", "Frame 2", "Frame 3"])
12print("\nStack unwind:")
13while stack:
14 print(f" Returning from: {stack.pop()}")
>>>Output
Queue: ['Task 1', 'Task 2', 'Task 3']
Processing: Task 1
Processing: Task 2
Processing: Task 3
 
Stack unwind:
Returning from: Frame 3
Returning from: Frame 2
Returning from: Frame 1

Bounded deque with maxlen

A particularly useful feature is creating a bounded deque with maxlen. When a bounded deque is full, adding to one end automatically removes an element from the opposite end. This is perfect for keeping recent history, implementing rate limiters, or maintaining sliding windows.

1from collections import deque
2
3# Bounded deque - keeps last N items automatically
4recent_logs = deque(maxlen=3)
5recent_logs.append("Log 1")
6recent_logs.append("Log 2")
7recent_logs.append("Log 3")
8print("Full:", list(recent_logs))
9
10recent_logs.append("Log 4")
11print("After adding Log 4:", list(recent_logs))
12
13recent_logs.append("Log 5")
14print("After adding Log 5:", list(recent_logs))
15
16# Great for keeping last N events, commands, or samples
17command_history = deque(maxlen=5)
18for cmd in ["ls", "cd", "pwd", "mkdir", "rm", "cp", "mv"]:
19 command_history.append(cmd)
20print("\nRecent commands:", list(command_history))
>>>Output
Full: ['Log 1', 'Log 2', 'Log 3']
After adding Log 4: ['Log 2', 'Log 3', 'Log 4']
After adding Log 5: ['Log 3', 'Log 4', 'Log 5']
 
Recent commands: ['mkdir', 'rm', 'cp', 'mv']

Rotation

The rotate() method efficiently rotates elements. Positive values rotate right (elements move toward higher indices, wrapping around), negative values rotate left. This is useful for circular buffer operations and round-robin scheduling.

1from collections import deque
2
3d = deque([1, 2, 3, 4, 5])
4print("Original:", d)
5
6# Rotate right: last 2 move to front
7d.rotate(2)
8print("Rotate right 2:", d)
9
10d.rotate(-3)
11print("Rotate left 3:", d)
12
13# Useful for round-robin scheduling
14players = deque(["Alice", "Bob", "Carol"])
15for round_num in range(1, 4):
16 print(f"Round {round_num}: {players[0]}'s turn")
17 players.rotate(-1)
>>>Output
Original: deque([1, 2, 3, 4, 5])
Rotate right 2: deque([4, 5, 1, 2, 3])
Rotate left 3: deque([2, 3, 4, 5, 1])
Round 1: Alice's turn
Round 2: Bob's turn
Round 3: Carol's turn

Sliding Window Pattern

Bounded deques are perfect for sliding window calculations. The maxlen parameter automatically maintains the window size, and the O(1) append makes it efficient for streaming data.

1from collections import deque
2
3def moving_average(values, window_size):
4 window = deque(maxlen=window_size)
5 averages = []
6 for value in values:
7 window.append(value)
8 if len(window) == window_size:
9 average = sum(window) / window_size
10 averages.append(round(average, 2))
11 return averages
12
13prices = [100, 102, 104, 103, 105, 107, 106, 108]
14print("Prices:", prices)
15print("3-period MA:", moving_average(prices, 3))
>>>Output
Prices: [100, 102, 104, 103, 105, 107, 106, 108]
3-period MA: [102.0, 103.0, 104.0, 105.0, 106.0, 107.0]
FIFO message queues
FIFO message queues
Process messages in order with O(1) enqueue and dequeue operations
Bounded log buffers
Bounded log buffers
Keep only the most recent N entries using maxlen auto-eviction
Sliding windows
Sliding windows
Compute moving averages and rate limits over a fixed window size
BFS graph traversal
BFS graph traversal
Breadth-first search using O(1) popleft instead of slow list.pop(0)
Undo/redo history
Undo/redo history
Maintain bounded operation history with automatic old entry discard

The reason deque achieves O(1) at both ends comes from its internal implementation.

Debug Challenge

> This queue drains a list using pop(0), which shifts every remaining element on each call. The code works but runs in O(n^2) time.

Performance issue: list.pop(0) is O(n) per call, making this O(n^2) overall

deque is the standard data structure for queue operations in Python. Its O(1) performance at both ends makes it dramatically faster than using a list with pop(0) for large datasets.

The maxlen parameter turns a deque into a fixed-size circular buffer, automatically discarding the oldest element whenever a new one is added. This makes bounded deques ideal for sliding window computations and log buffers.

Unlike lists, deque does not support efficient random access by index. For workloads that need both fast front/back operations and random indexing, consider other data structures like indexed trees.

bisect Binary Search

Daily Life
Interviews

Search and insert in sorted lists fast

The bisect module provides binary search functions for maintaining sorted lists. It finds insertion points in O(log n) time, making it efficient for scenarios where you repeatedly insert into and search sorted data. The module is named after the bisection algorithm, which repeatedly divides the search space in half.

While you could maintain a sorted list by calling sort() after each insertion, that would be O(n log n) per insertion. Using bisect to find the insertion point is O(log n), and the subsequent list insertion is O(n), making the overall operation faster for large lists. For scenarios requiring frequent sorted insertions and searches, bisect is invaluable.

Finding Insertion Points

bisect_left() and bisect_right() (or equivalently bisect()) find the index where an element should be inserted to maintain sorted order. The difference matters when the value already exists in the list:

1import bisect
2
3sorted_list = [10, 20, 30, 40, 50]
4
5# Find insertion point for new value
6pos = bisect.bisect_left(sorted_list, 25)
7print(f"Insert 25 at index: {pos}")
8
9# For existing values, left vs right matters
10pos_left = bisect.bisect_left(sorted_list, 30)
11pos_right = bisect.bisect_right(sorted_list, 30)
12print(f"bisect_left(30): {pos_left}")
13print(f"bisect_right(30): {pos_right}")
14
15# Edge cases: smaller/larger than all elements
16print(f"Insert 5 at: {bisect.bisect_left(sorted_list, 5)}")
17print(f"Insert 100 at: {bisect.bisect_left(sorted_list, 100)}")
>>>Output
Insert 25 at index: 2
bisect_left(30): 2
bisect_right(30): 3
Insert 5 at: 0
Insert 100 at: 5
bisect_leftbisect_rightnew value
bisect_left
Insert before
Goes before any duplicates
bisect_right
Insert after
Goes after all duplicates
new value
Same position
Both agree when no match

Insert in Sorted Order

insort_left() and insort_right() combine finding the position and inserting in one operation. These are convenience functions equivalent to finding the position with bisect and then calling list.insert():

1import bisect
2
3# Maintain a sorted leaderboard
4scores = [72, 85, 91, 95]
5print("Initial:", scores)
6
7# Insert new scores in sorted position
8bisect.insort(scores, 88)
9print("After insort 88:", scores)
10
11bisect.insort(scores, 95)
12print("After insort 95:", scores)
13
14bisect.insort_left(scores, 91)
15print("After insort_left 91:", scores)
16
17# Build sorted list from unsorted data
18data = [5, 2, 8, 1, 9, 3]
19sorted_data = []
20for item in data:
21 bisect.insort(sorted_data, item)
22print("Built sorted:", sorted_data)
>>>Output
Initial: [72, 85, 91, 95]
After insort 88: [72, 85, 88, 91, 95]
After insort 95: [72, 85, 88, 91, 95, 95]
After insort_left 91: [72, 85, 88, 91, 91, 95, 95]
Built sorted: [1, 2, 3, 5, 8, 9]

Binary Search: Exact Match

The bisect module finds insertion points, not exact matches. To check if a value exists, use bisect_left() and verify the element at that position:

1import bisect
2
3def binary_search(sorted_list, target):
4 """Return index of target if found, else -1"""
5 idx = bisect.bisect_left(sorted_list, target)
6 if idx < len(sorted_list) and sorted_list[idx] == target:
7 return idx
8 return -1
9
10numbers = [10, 20, 30, 40, 50, 60, 70]
11print("Search 30:", binary_search(numbers, 30))
12print("Search 35:", binary_search(numbers, 35))
13print("Search 10:", binary_search(numbers, 10))
14print("Search 80:", binary_search(numbers, 80))
15
16# More efficient than 'in' operator for sorted lists
17# 'x in list' is O(n), binary_search is O(log n)
>>>Output
Search 30: 2
Search 35: -1
Search 10: 0
Search 80: -1

Grade Classification

A classic and elegant bisect application is mapping numeric values to categories using breakpoints. This pattern is cleaner than a chain of if-elif statements and scales to any number of categories:

1import bisect
2
3def get_grade(score):
4 """Convert numeric score to letter grade"""
5 # F < 60, D < 70, C < 80, B < 90, A >= 90
6 breakpoints = [60, 70, 80, 90]
7 grades = "FDCBA"
8 idx = bisect.bisect(breakpoints, score)
9 return grades[idx]
10
11# Test various scores
12test_scores = [55, 65, 75, 85, 95, 60, 90, 100, 0]
13for score in test_scores:
14 grade = get_grade(score)
15 print(f"{score:3d}: {grade}")
>>>Output
55: F
65: D
75: C
85: B
95: A
60: D
90: A
100: A
0: F

Range Queries

bisect enables efficient range queries on sorted data. By finding the insertion points for the low and high bounds, you can count or retrieve all elements within a range in O(log n) time for the search, plus O(k) for the k elements in the range.

1import bisect
2
3def get_in_range(sorted_list, low, high):
4 left_index = bisect.bisect_left(sorted_list, low)
5 right_index = bisect.bisect_right(sorted_list, high)
6 return sorted_list[left_index:right_index]
7
8prices = [10, 15, 20, 25, 30, 35, 40, 45, 50]
9
10items = get_in_range(prices, 20, 40)
11print(f"Items in [20, 40]: {items} (count: {len(items)})")
12
13above_30 = prices[bisect.bisect_right(prices, 30):]
14print(f"Items > 30: {above_30}")
>>>Output
Items in [20, 40]: [20, 25, 30, 35, 40] (count: 5)
Items > 30: [35, 40, 45, 50]
Linear Search
  • O(n) - check every element
  • Works on unsorted data
  • Simple to implement
  • Slow for large datasets
bisect Search
  • O(log n) - halve search space
  • Requires sorted data
  • Ideal for sorted lists
  • Fast for any dataset size
TIP
Remember that bisect only provides O(log n) for finding the position. The actual list insertion with insort is still O(n) because list.insert() must shift elements. For workloads with many insertions, consider a different data structure like a balanced tree.
Python Quiz

> Insert a value into a sorted list while keeping it sorted, then find the index where that value now sits. Choose the bisect function for each step.

import bisect
data = [10, 20, 40, 50]
bisect.___(data, 30)
pos = bisect.___(data, 30)
print(pos)
insert
bisect_right
bisect_left
append
insort

Common Mistakes

Even experienced developers make these mistakes with specialized collections. Understanding these pitfalls will help you avoid subtle bugs and use these tools correctly.
Do
  • Use deque for queue operations instead of list.pop(0)
  • Check "key in dict" before accessing defaultdict to avoid creating phantom keys
  • Sort your list before using bisect functions
Don't
  • Iterate over a heap expecting sorted order
  • Use bisect on unsorted data (it silently gives wrong results)
  • Forget that Counter subtraction drops zero and negative counts

Expecting Heaps Sorted

A heap is NOT a sorted list. The heap property only guarantees that the minimum is at index 0. Iterating over a heapified list does not give sorted order. To get sorted output, you must pop elements one by one:

1import heapq
2
3data = [5, 3, 8, 1, 9, 2, 7]
4heapq.heapify(data)
5
6# WRONG: This is NOT sorted!
7print("Heap (NOT sorted):", data)
8
9# CORRECT: Pop for sorted order
10heap_copy = data.copy()
11sorted_result = []
12while heap_copy:
13 sorted_result.append(heapq.heappop(heap_copy))
14print("Sorted via heappop:", sorted_result)
>>>Output
Heap (NOT sorted): [1, 3, 2, 5, 9, 8, 7]
Sorted via heappop: [1, 2, 3, 5, 7, 8, 9]

defaultdict Side Effects

Accessing a missing key in defaultdict creates it. This is usually helpful, but it can cause unexpected keys when you're checking for existence or have typos in key names:

1from collections import defaultdict
2
3counts = defaultdict(int)
4counts["apples"] = 5
5
6# WRONG: This check creates the key with value 0!
7# Creates key, returns 0 (falsy)
8if counts["bananas"]:
9 print("Has bananas")
10
11print("Keys after wrong check:", list(counts.keys()))
12
13# CORRECT: Use 'in' for existence checks
14counts2 = defaultdict(int)
15counts2["apples"] = 5
16
17if "bananas" in counts2:
18 print("Has bananas")
19
20print("Keys (correct):", list(counts2.keys()))
>>>Output
Keys after wrong check: ['apples', 'bananas']
Keys (correct): ['apples']

bisect on Unsorted Data

bisect assumes the list is already sorted. Using it on unsorted data produces wrong results without any error or warning:

1import bisect
2
3# WRONG: unsorted list gives wrong results
4unsorted = [30, 10, 50, 20, 40]
5pos = bisect.bisect_left(unsorted, 25)
6print(f"Wrong position in unsorted: {pos}")
7
8sorted_list = sorted(unsorted)
9pos = bisect.bisect_left(sorted_list, 25)
10print(f"Correct position in sorted: {pos}")
11print(f"Sorted list: {sorted_list}")
>>>Output
Wrong position in unsorted: 1
Correct position in sorted: 2
Sorted list: [10, 20, 30, 40, 50]

list vs deque: When to Pick

Using list.pop(0) in a loop is a common performance mistake. For queue-like operations where you add to one end and remove from the other, always use deque:

1from collections import deque
2
3# pop(0) shifts all elements - O(n)
4# deque.popleft() is O(1) - no shifting
5
6# For 10000 items processed as queue:
7# list: O(n^2) total = 100 million shifts
8# deque: O(n) total = just 20000 operations
9
10q_list = [1, 2, 3]
11# Slow: shifts all elements left
12q_list.pop(0)
13print("List after pop(0):", q_list)
14
15q_deque = deque([1, 2, 3])
16# Fast: no shifting needed
17q_deque.popleft()
18print("Deque after popleft:", list(q_deque))
>>>Output
List after pop(0): [2, 3]
Deque after popleft: [2, 3]

Specialized collection types including heapq, Counter, defaultdict, deque, and bisect solve real-world performance and design problems that basic structures cannot. Put your skills to the test with hands-on challenges in the Python Builder.

PUTTING IT ALL TOGETHER

> You are a data engineer at Uber building a real-time ride-dispatch system that surfaces the nearest available driver, counts request types by city, groups drivers by zone without key errors, maintains a sliding window of recent GPS pings, and inserts arrival estimates into a sorted schedule.

heapq maintains a min-heap of driver distances so the nearest available driver is retrieved in O(log n) without sorting the full list.
Counter tallies ride-request types per city in one pass, exposing the most frequent request category with a single .most_common() call.
defaultdict with a list factory automatically initializes the per-zone driver groups, eliminating explicit key-existence checks on every insert.
deque enforces a fixed-length sliding window of recent GPS pings per driver, efficiently appending new positions and discarding stale ones from the left.
KEY TAKEAWAYS
heapq provides O(1) access to minimum and O(log n) push/pop for efficient priority queue operations
nlargest() and nsmallest() efficiently find top/bottom N elements, auto-selecting the optimal algorithm
Counter automates frequency counting with most_common(), arithmetic operations, and zero for missing keys
defaultdict eliminates key-existence checks by auto-initializing missing keys with a factory function
deque provides O(1) operations at both ends, essential for queues and sliding windows
Bounded deques with maxlen automatically maintain fixed-size buffers, discarding old elements
bisect enables O(log n) binary search on sorted lists for insertion points, range queries, and classification
Choose the right collection: heaps for priority, Counter for frequency, defaultdict for grouping, deque for queues, bisect for sorted data
These specialized containers often replace manual implementations with faster, cleaner, and more maintainable solutions
Always verify assumptions: heaps are not sorted, bisect requires sorted input, defaultdict creates keys on access

Professional data structure tools

Category
Python
Difficulty
advanced
Duration
43 minutes
Challenges
0 hands-on challenges

Topics covered: heapq Operations, Counter for Frequency, defaultdict Usage, deque Double-Ended Operations, bisect Binary Search

Lesson Sections

  1. heapq Operations (concepts: pyHeapTopK)

    The heap property is maintained implicitly through the array representation. For an element at index i, its left child is at index 2i+1 and its right child is at 2i+2. When you push or pop elements, the heap operations restore the heap property by "bubbling up" or "bubbling down" elements as needed. Creating and Using Heaps Finding N Smallest/Largest Priority Queue Pattern Notice that tasks with the same priority are processed in insertion order within that priority level. This is because Python

  2. Counter for Frequency

    Creating Counters The most_common() Method Counter Arithmetic Updating Counters Practical Applications

  3. defaultdict Usage

    The Problem It Solves Without defaultdict, grouping operations require verbose key-existence checks. This pattern is so common that it became tedious boilerplate in many codebases. Consider this common task of grouping employees by department: The defaultdict Solution Common Factory Functions Nested defaultdicts

  4. deque Double-Ended Operations (concepts: pyStackQueue)

    List Performance Problems The deque Operations Queue and Stack with deque Bounded deque with maxlen Rotation Sliding Window Pattern

  5. bisect Binary Search (concepts: pyBinarySearch)

    Finding Insertion Points Insert in Sorted Order Binary Search: Exact Match Grade Classification Range Queries Common Mistakes Even experienced developers make these mistakes with specialized collections. Understanding these pitfalls will help you avoid subtle bugs and use these tools correctly. Expecting Heaps Sorted defaultdict Side Effects bisect on Unsorted Data list vs deque: When to Pick