Dictionaries: Advanced

Ansible, the infrastructure automation tool used by thousands of enterprise engineering teams, uses ChainMap to layer playbook variables, inventory variables, and command-line overrides so that the most specific setting always wins without any copy-and-merge boilerplate. Kubernetes' Python client uses the same pattern to merge pod specs with namespace and cluster defaults, creating a final configuration that reflects every level of the hierarchy. The advanced dictionary techniques in this lesson, including ChainMap and custom mapping classes, are the patterns behind that kind of elegant layered configuration.

Deep Copying Dictionaries

Daily Life

Interviews

Create fully independent dict copies

In the intermediate lesson, we saw that .copy() creates a shallow copy: the top-level dictionary is new, but nested objects are still shared. This can lead to subtle bugs when you modify what you think is a copy.

	original = {
	"user": "Alice",
	"scores": [85, 90, 78]
	}

	shallow = original.copy()
	shallow["scores"].append(95)

	# Both dicts see the change!
	print("Original:", original)
	print("Shallow:", shallow)

>>>Output

Original: {'user': 'Alice', 'scores': [85, 90, 78, 95]}

Shallow: {'user': 'Alice', 'scores': [85, 90, 78, 95]}

The problem is that both dictionaries share the same list object. Appending to the list affects both because they're pointing to the same list in memory.

The copy Module

Python's copy module provides deepcopy(), which recursively copies all nested objects. After a deep copy, the original and copy are completely independent:

	import copy

	original = {
	"user": "Alice",
	"scores": [85, 90, 78]
	}

	deep = copy.deepcopy(original)
	deep["scores"].append(95)

	print("Original:", original)
	print("Deep copy:", deep)

>>>Output

Original: {'user': 'Alice', 'scores': [85, 90, 78]}

Deep copy: {'user': 'Alice', 'scores': [85, 90, 78, 95]}

Preserve original state

Deep copy before modifying nested data so you can roll back if needed.

Protect from mutation

Pass a deep copy to functions that might change nested lists or dicts.

Performance Considerations

Deep copying is slower and uses more memory than shallow copying because it must traverse and duplicate every nested object. For deeply nested or large data structures, this cost can be significant:

= Assignment

Instant, no new memory, both variables point to same data

.copy() Shallow

Fast, copies top level only, nested objects are still shared

deepcopy() Full

Slow, copies everything recursively, fully independent result

TIP

Don't default to deepcopy() "just in case." Think about whether you actually need independent nested data. Often a shallow copy or even no copy is the right choice.

Python Quiz

> After creating a copy of nested data, you append to the copy's list. Pick the copy function that keeps the original independent, and the variable whose list length is still 2.

import copy
data = {"users": ["Alice", "Bob"]}
snapshot = copy.___(data)
snapshot["users"].append("Carol")
print(len({{data}}["users"]))

deepcopy

copy

clone

data

snapshot

A practical rule of thumb: use assignment when you want shared state, .copy() when you want an independent top-level dict but shared nested objects, and deepcopy() when you need fully independent data at every level.

Deep copying is slower for large nested structures because it traverses and duplicates every object recursively. Profile before defaulting to deepcopy() on performance-critical paths.

When working with configuration objects or templates that get modified per-request, deepcopy() is the right tool. It ensures each caller receives an independent copy that they can customize without affecting others.

defaultdict: Auto-Initializing Keys

Daily Life

Interviews

Eliminate missing-key boilerplate

Remember how we used .setdefault() or .get() to handle missing keys? The collections module provides defaultdict, which automatically creates missing keys with a default value. This eliminates the need for manual checking.

	from collections import defaultdict

	# defaultdict auto-creates lists
	groups = defaultdict(list)

	groups["fruits"].append("apple")
	groups["fruits"].append("banana")
	groups["vegetables"].append("carrot")

	print(dict(groups))

>>>Output

{'fruits': ['apple', 'banana'], 'vegetables': ['carrot']}

The argument to defaultdict is a "factory function" that gets called to create the default value. Common factories include list (for grouping), int (for counting), and set (for unique collections).

Counting with defaultdict

When you use int as the factory, missing keys automatically get the value 0 (since int() returns 0). This makes counting trivially simple:

	from collections import defaultdict

	word_counts = defaultdict(int)
	text = "the quick brown fox jumps over the lazy dog"

	for word in text.split():
	# Auto-creates with 0, then adds 1
	word_counts[word] += 1

	print(dict(word_counts))

>>>Output

{'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1}

Compare this to the manual approach with .get(). The defaultdict version is cleaner and less error-prone because you don't have to remember the initialization logic.

Nested defaultdicts

You can create multi-level defaultdicts for building complex nested structures automatically. This is useful when processing hierarchical data:

	from collections import defaultdict

	# Two-level: dept -> role -> employees
	org = defaultdict(lambda: defaultdict(list))

	org["Engineering"]["Senior"].append("Alice")
	org["Engineering"]["Junior"].append("Bob")
	org["Marketing"]["Manager"].append("Carol")

	print(dict(org["Engineering"]))

>>>Output

{'Senior': ['Alice'], 'Junior': ['Bob']}

TIP

The lambda: defaultdict(list) pattern creates a function that returns a new defaultdict. This is needed because defaultdict requires a callable (function), not a value.

Fill in the Blank

> You want to tally letter frequencies using defaultdict with += 1. Pick the factory function whose default value supports addition with integers.

from collections import defaultdict
counts = defaultdict()
counts["a"] += 1
counts["b"] += 1
counts["a"] += 1
print(dict(counts))

defaultdict eliminates the most common boilerplate in Python data processing: the pattern of checking if a key exists before updating it. With the right factory, you can write grouping and counting code that is both shorter and more readable.

The factory function is called with no arguments every time a missing key is accessed. This means you can use any callable that returns your desired default - not just built-ins like int or list, but also lambda functions or custom classes.

When iterating over a defaultdict, convert it to a regular dict first with dict(d) to avoid accidentally creating empty entries for keys that were only checked but never set.

Counter: Purpose-Built for Counting

Daily Life

Interviews

Rank items by frequency instantly

While defaultdict(int) works for counting, Python provides a specialized Counter class that's optimized for this exact use case. Counter has extra features that make counting tasks even easier.

	from collections import Counter

	# Count characters in a string
	letter_counts = Counter("mississippi")
	print(letter_counts)

	# Count items in a list
	colors = ["red", "blue", "red", "green", "blue", "red"]
	color_counts = Counter(colors)
	print(color_counts)

>>>Output

Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})

Counter({'red': 3, 'blue': 2, 'green': 1})

Counter provides several useful methods that you'd have to implement yourself with a regular dictionary.

The .most_common() Method

Get the N most frequent items with .most_common(), already sorted by count:

	from collections import Counter

	words = "the quick brown fox jumps over the lazy dog the fox".split()
	counts = Counter(words)

	# Get the 3 most common words
	print(counts.most_common(3))

	# Get all items sorted by frequency
	print(counts.most_common())

>>>Output

[('the', 3), ('fox', 2), ('quick', 1)]

[('the', 3), ('fox', 2), ('quick', 1), ('brown', 1), ('jumps', 1), ('over', 1), ('lazy', 1), ('dog', 1)]

Counter Arithmetic

Counters support arithmetic operations. You can add, subtract, or find the intersection/union of counts:

	from collections import Counter

	morning = Counter({"coffee": 2, "tea": 1})
	afternoon = Counter({"coffee": 1, "water": 3})

	# Add counts together
	total = morning + afternoon
	print("Total:", total)

	diff = morning - afternoon
	print("Difference:", diff)

>>>Output

Total: Counter({'coffee': 3, 'water': 3, 'tea': 1})

Difference: Counter({'coffee': 1, 'tea': 1})

Python offers two main tools for counting. Here is how they compare.

•defaultdict(int)

General-purpose counting
Manual iteration needed
No special methods
Part of collections

•Counter

Specialized for counting
Counts items automatically
.most_common(), arithmetic
Part of collections

Counter shines in production because of its convenience methods and arithmetic support.

OrderedDict: Stable Order

In Python 3.7 and later, regular dictionaries maintain insertion order. Before that, order was not guaranteed. OrderedDict explicitly guarantees order and provides additional methods for reordering.

	from collections import OrderedDict

	od = OrderedDict()
	od["first"] = 1
	od["second"] = 2
	od["third"] = 3

	print(list(od.keys()))

>>>Output

['first', 'second', 'third']

While modern regular dicts also maintain order, OrderedDict has a method that regular dicts lack: move_to_end().

move_to_end() Reordering

The move_to_end() method moves a key to either end of the dictionary. This is useful for implementing LRU (Least Recently Used) caches:

	from collections import OrderedDict

	od = OrderedDict([("a", 1), ("b", 2), ("c", 3)])
	print("Before:", list(od.keys()))

	# Move "a" to the end
	od.move_to_end("a")
	print("After move_to_end:", list(od.keys()))

	# Move "c" to the beginning (last=False)
	od.move_to_end("c", last=False)
	print("After move to start:", list(od.keys()))

>>>Output

Before: ['a', 'b', 'c']

After move_to_end: ['b', 'c', 'a']

After move to start: ['c', 'b', 'a']

TIP

In most modern code, regular dicts are sufficient since they preserve order. Use OrderedDict when you need move_to_end() or when writing code that must run on Python 3.6 or earlier.

Dictionary Performance

Daily Life

Interviews

Choose dicts over lists for speed

Understanding dictionary performance is crucial for writing efficient code, especially in data engineering where you process large datasets. Dictionaries use a technique called hashing that makes most operations extremely fast.

How Hashing Works

When you add a key to a dictionary, Python computes a hash value from the key. This hash determines where in memory the value is stored. When you look up a key, Python computes the same hash and jumps directly to that location.

ACCESSADDDELETECHECKITERATE

ACCESS

O(1) Constant

Instant key lookup always

ADD

O(1) Constant

Insert new pair instantly

DELETE

O(1) Constant

Remove any key instantly

CHECK

O(1) Constant

"in" checks key in O(1)

ITERATE

O(n) Linear

Must visit every element

O(1) means the operation takes the same amount of time whether your dictionary has 10 items or 10 million items. This is incredibly powerful. A list would require O(n) time to find an item, meaning it gets slower as the list grows.

dict vs list Lookups

When you need to check if an item exists in a collection, dictionaries vastly outperform lists for large datasets:

	# Checking membership: list vs dict
	large_list = list(range(1000000))
	large_dict = {i: True for i in range(1000000)}

	# Both lines check if 999999 exists
	# List: must scan up to 1 million items (slow)
	# Dict: computes hash and jumps directly (fast)
	print(999999 in large_list)
	print(999999 in large_dict)

>>>Output

True

True

•item in list

O(n) - slower as list grows
Must check every element
1M items = up to 1M checks
Avoid for large datasets

•key in dict

O(1) - constant speed
Hash calculation + jump
1M items = still instant
Ideal for lookups

This performance gap has real consequences in production systems.

Memory Considerations

Dictionaries trade memory for speed. They preallocate extra space to maintain fast operations as items are added. Understanding this helps you make informed decisions about data structure choices.

Dictionary Overhead

A dictionary uses more memory than an equivalent list or tuple because it stores both keys and values, plus additional hash table overhead:

	import sys

	# Compare memory usage
	list_data = [1, 2, 3, 4, 5]
	dict_data = {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}

	print("List size:", sys.getsizeof(list_data), "bytes")
	print("Dict size:", sys.getsizeof(dict_data), "bytes")

>>>Output

List size: 104 bytes

Dict size: 232 bytes

For small amounts of data, this overhead is negligible. For millions of items, it can matter. However, the speed benefit usually outweighs the memory cost.

TIP

If you have millions of items with integer keys from 0 to n, a list might be more memory-efficient. If you have sparse keys or string keys, a dictionary is the right choice.

Advanced Dictionary Patterns

Daily Life

Interviews

Solve two-sum and cache with dicts

These patterns appear frequently in technical interviews and production code. Each demonstrates a clever way to use dictionaries to solve problems that would otherwise require slower, more complex approaches.

Two-Sum Pattern

One of the most famous interview problems is finding two numbers in a list that add up to a target. The optimal solution uses a dictionary to achieve O(n) time:

	def two_sum(nums, target):
	seen = {}
	for i, num in enumerate(nums):
	complement = target - num
	if complement in seen:
	return [seen[complement], i]
	seen[num] = i
	return None

	numbers = [2, 7, 11, 15]
	result = two_sum(numbers, 9)
	print(result)

>>>Output

[0, 1]

This pattern appears in many variations. The key insight is using the dictionary to "remember" what you've seen, turning a potential O(n²) nested loop into O(n).

Caching with Dictionaries

Memoization is a technique where you cache function results to avoid redundant computation. Dictionaries are perfect for this:

	# Naive Fibonacci is very slow for large n
	# Memoized version is fast

	cache = {}

	def fib(n):
	if n in cache:
	return cache[n]
	if n <= 1:
	return n
	result = fib(n-1) + fib(n-2)
	cache[n] = result
	return result

	print(fib(30))

>>>Output

832040

TIP

Python's functools module provides the @lru_cache decorator that does this automatically. But understanding the dictionary-based approach helps you implement custom caching strategies.

Graph Representation

Dictionaries are the standard way to represent graphs in Python. Each key is a node, and the value is a list of connected nodes:

	# Adjacency list representation of a graph
	graph = {
	"A": ["B", "C"],
	"B": ["A", "D"],
	"C": ["A", "D"],
	"D": ["B", "C"]
	}

	# Find all neighbors of node "A"
	print("Neighbors of A:", graph["A"])

	# Check if B and D are connected
	print("B connects to D:", "D" in graph["B"])

>>>Output

Neighbors of A: ['B', 'C']

B connects to D: True

State Machines

Dictionaries can elegantly represent state transitions. The key is the current state, and the value is another dictionary mapping inputs to next states:

	# Simple traffic light state machine
	transitions = {
	"green": {"timer": "yellow"},
	"yellow": {"timer": "red"},
	"red": {"timer": "green"}
	}

	current = "green"
	for _ in range(4):
	print(f"Light is {current}")
	current = transitions[current]["timer"]

>>>Output

Light is green

Light is yellow

Light is red

Light is green

Dictionaries and JSON

Daily Life

Interviews

Convert between dicts and JSON

JSON (JavaScript Object Notation) is the most common data format for APIs and configuration files. Python dictionaries map directly to JSON objects, making conversion seamless:

	import json

	# Dictionary to JSON string
	user = {"name": "Alice", "age": 28, "active": True}
	json_str = json.dumps(user)
	print("JSON:", json_str)

	# JSON string back to dictionary
	parsed = json.loads(json_str)
	print("Dict:", parsed)
	print("Type:", type(parsed))

>>>Output

JSON: {"name": "Alice", "age": 28, "active": true}

Dict: {'name': 'Alice', 'age': 28, 'active': True}

Type: <class 'dict'>

dumps()loads()dump()load()

dumps()

Dict to string

Serialize dict to JSON

loads()

String to dict

Parse JSON into a dict

dump()

Dict to file

Write dict as JSON to file

load()

File to dict

Read JSON file into dict

In data engineering, you'll constantly convert between dictionaries and JSON when working with APIs, configuration files, and data pipelines.

Python Quiz

> Convert a dictionary to a JSON string and parse it back. The "s" suffix distinguishes string operations from file operations.

import json
user = {"name": "Alice", "age": 28}
text = json.___(user)
back = json.___(text)
print(type(back))

dumps

loads

dump

load

JSON and Python dictionaries map almost perfectly to each other. JSON objects become Python dicts, JSON arrays become Python lists, JSON booleans become Python True/False, and JSON null becomes Python None.

One key difference: JSON keys must always be strings. When you use json.dumps() on a dict with integer keys, Python will convert them to strings automatically, which may change the structure when you parse the result back.

Use json.dumps(data, indent=2) to produce human-readable JSON with indentation. This is invaluable for debugging API responses and configuration files during development.

Dictionary Gotchas

Daily Life

Interviews

Avoid mutation and iteration traps

Even experienced developers make these mistakes. Being aware of these gotchas helps you avoid subtle bugs.

Modifying During Iteration

Adding or removing keys while iterating causes a RuntimeError. If you need to modify, iterate over a copy:

	data = {"a": 1, "b": 2, "c": 3}

	# WRONG: raises RuntimeError
	# for key in data:
	# if data[key] < 2:
	# del data[key]

	# CORRECT: Iterate over a copy
	for key in list(data.keys()):
	if data[key] < 2:
	del data[key]

	print(data)

>>>Output

{'b': 2, 'c': 3}

Mutable Default Arguments

Using a dictionary as a default argument is a famous Python pitfall. The default is created once and shared across all calls:

	# WRONG: Mutable default is shared!
	def add_item_wrong(item, bag={}):
	bag[item] = True
	return bag

	print(add_item_wrong("apple"))
	# Both items appear due to shared mutable default!
	print(add_item_wrong("banana"))

	# CORRECT: Use None as default
	def add_item_right(item, bag=None):
	if bag is None:
	bag = {}
	bag[item] = True
	return bag

	print(add_item_right("apple"))
	print(add_item_right("banana"))

>>>Output

{'apple': True}

{'apple': True, 'banana': True}

{'apple': True}

{'banana': True}

✓Do

Use None as default for dict/list arguments
Create a new dict inside the function body
Document when a function mutates its input

✗Don't

Use {} or [] as default argument values
Assume each call gets a fresh default
Share mutable state between function calls

Integer Key Confusion

In Python, the integer 1 and the boolean True hash to the same value. This can lead to unexpected overwrites:

	d = {True: "boolean", 1: "integer"}
	print(d)
	print(len(d))

	# Same with False and 0
	d2 = {False: "boolean", 0: "integer"}
	print(d2)

>>>Output

{True: 'integer'}

1

{False: 'integer'}

TIP

This rarely comes up in practice because you wouldn't normally mix booleans and integers as keys. But it's a fun fact that explains some confusing behavior.

This loop tries to remove small values from a dictionary, but it crashes. Fix the bug by removing the tile that causes the runtime error.

Debug Challenge

> This loop deletes keys from a dictionary while iterating over it, which causes a RuntimeError because the dictionary size changes mid-iteration.

RuntimeError: dictionary changed size during iteration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99





data = {"a": 1, "b": 5, "c": 2}
for key in data:
  if data[key] < 3:
    del data[key]
print(data)
data = {"a": 1, "b": 5, "c": 2}
for key in data:
  if data[key] < 3:
    del data[key]
print(data)

Dictionaries power many architectural decisions in real systems. Practice choosing the right dictionary-based pattern for a data engineering scenario.

Iterating over a copy of the keys with list(data.keys()) or list(data) is the safest pattern when you need to modify a dictionary during a loop. It is explicit, readable, and avoids the RuntimeError entirely.

Mutable default arguments are one of Python's most notorious gotchas. The rule is simple: never use a dict, list, or set as a default parameter value. Always use None and create the mutable object inside the function.

Lookup Table ArchitectureStep 1

You are building a Python service that enriches incoming event records with user profile data. Each event contains a user_id, and you need to look up the user's name, plan tier, and region before forwarding the enriched record downstream. The system processes 50,000 events per minute.

incoming_events

event_id	user_id	action	timestamp
e_001	u_42	page_view	2024-06-01T10:00:00
e_002	u_99	purchase	2024-06-01T10:00:01
e_003	u_42	click	2024-06-01T10:00:02

Jul 2026

You need to look up user profiles for each event. The user table has 500,000 rows. How do you structure the lookup?

Advanced dictionary techniques unlock powerful patterns for data transformation and configuration management. Put your skills to the test with hands-on challenges in the Python Builder.

The two-sum pattern - using a dictionary to remember previously seen values - is one of the most broadly applicable interview algorithms. Any problem that asks "have I seen this complement before?" can be solved with a dictionary in O(n) instead of O(n²) with nested loops.

State machines represented as nested dictionaries are easy to extend and test. Adding a new state or transition only requires adding one entry to the dictionary, with no changes to the traversal logic.

❯❯❯PUTTING IT ALL TOGETHER

> You are a senior data engineer at Cloudflare building an in-memory lookup cache for user segment data: you deep-copy nested baseline configs to prevent mutation, use defaultdict to accumulate request counts per segment, apply Counter to rank the top segments, and rely on O(1) dictionary access to serve thousands of model scoring requests per second.

copy.deepcopy() creates a truly independent copy of the nested baseline config so cache warmup never corrupts the source template.

defaultdict(int) auto-initializes missing segment keys to zero, eliminating guard checks when incrementing request counters per segment.

Counter ranks segments by request frequency with .most_common(), directly driving which segments receive priority cache slots.

O(1) dictionary lookups allow the scoring service to retrieve any segment's feature data in constant time regardless of cache size.

KEY TAKEAWAYS

Use copy.deepcopy() for truly independent copies of nested data

defaultdict auto-creates missing keys with a default value

Counter is optimized for counting with .most_common() and arithmetic

Dictionary operations (access, add, delete) are O(1) constant time

Convert lists to dicts when you need fast membership testing

json.dumps() and json.loads() convert between dicts and JSON

Never modify a dict while iterating; use list(dict.keys())

Use None as default for mutable function arguments

Dictionary mastery for production code

Category: Python
Difficulty: advanced
Duration: 30 minutes
Challenges: 3 hands-on challenges

Topics covered: Deep Copying Dictionaries, defaultdict: Auto-Initializing Keys, Counter: Purpose-Built for Counting, Dictionary Performance, Advanced Dictionary Patterns, Dictionaries and JSON, Dictionary Gotchas

Lesson Sections

Deep Copying Dictionaries (concepts: pyDictMethods)
The problem is that both dictionaries share the same list object. Appending to the list affects both because they're pointing to the same list in memory. The copy Module Performance Considerations Deep copying is slower and uses more memory than shallow copying because it must traverse and duplicate every nested object. For deeply nested or large data structures, this cost can be significant:
defaultdict: Auto-Initializing Keys (concepts: pyCollections)
Counting with defaultdict Nested defaultdicts You can create multi-level defaultdicts for building complex nested structures automatically. This is useful when processing hierarchical data:
Counter: Purpose-Built for Counting (concepts: pyFrequencyCount)
The .most_common() Method Counter Arithmetic Counters support arithmetic operations. You can add, subtract, or find the intersection/union of counts: Python offers two main tools for counting. Here is how they compare. OrderedDict: Stable Order In Python 3.7 and later, regular dictionaries maintain insertion order. Before that, order was not guaranteed. OrderedDict explicitly guarantees order and provides additional methods for reordering. move_to_end() Reordering
Dictionary Performance (concepts: pyCollections)
Understanding dictionary performance is crucial for writing efficient code, especially in data engineering where you process large datasets. Dictionaries use a technique called hashing that makes most operations extremely fast. How Hashing Works When you add a key to a dictionary, Python computes a hash value from the key. This hash determines where in memory the value is stored. When you look up a key, Python computes the same hash and jumps directly to that location. dict vs list Lookups When
Advanced Dictionary Patterns (concepts: pyDictCreate)
These patterns appear frequently in technical interviews and production code. Each demonstrates a clever way to use dictionaries to solve problems that would otherwise require slower, more complex approaches. Two-Sum Pattern Caching with Dictionaries Memoization is a technique where you cache function results to avoid redundant computation. Dictionaries are perfect for this: Graph Representation Dictionaries are the standard way to represent graphs in Python. Each key is a node, and the value is
Dictionaries and JSON (concepts: pyJsonHandling)
JSON (JavaScript Object Notation) is the most common data format for APIs and configuration files. Python dictionaries map directly to JSON objects, making conversion seamless: In data engineering, you'll constantly convert between dictionaries and JSON when working with APIs, configuration files, and data pipelines.
Dictionary Gotchas (concepts: pyDictIterate)
Even experienced developers make these mistakes. Being aware of these gotchas helps you avoid subtle bugs. Modifying During Iteration Adding or removing keys while iterating causes a RuntimeError. If you need to modify, iterate over a copy: Mutable Default Arguments Using a dictionary as a default argument is a famous Python pitfall. The default is created once and shared across all calls: Integer Key Confusion In Python, the integer 1 and the boolean True hash to the same value. This can lead t