Dictionaries: Advanced

Ansible, the infrastructure automation tool used by thousands of enterprise engineering teams, uses ChainMap to layer playbook variables, inventory variables, and command-line overrides so that the most specific setting always wins without any copy-and-merge boilerplate. Kubernetes' Python client uses the same pattern to merge pod specs with namespace and cluster defaults, creating a final configuration that reflects every level of the hierarchy. The advanced dictionary techniques in this lesson, including ChainMap and custom mapping classes, are the patterns behind that kind of elegant layered configuration.

Deep Copying Dictionaries

Daily Life
Interviews

Create fully independent dict copies

In the intermediate lesson, we saw that .copy() creates a shallow copy: the top-level dictionary is new, but nested objects are still shared. This can lead to subtle bugs when you modify what you think is a copy.

1original = {
2 "user": "Alice",
3 "scores": [85, 90, 78]
4}
5
6shallow = original.copy()
7shallow["scores"].append(95)
8
9# Both dicts see the change!
10print("Original:", original)
11print("Shallow:", shallow)
>>>Output
Original: {'user': 'Alice', 'scores': [85, 90, 78, 95]}
Shallow: {'user': 'Alice', 'scores': [85, 90, 78, 95]}
The problem is that both dictionaries share the same list object. Appending to the list affects both because they're pointing to the same list in memory.

The copy Module

Python's copy module provides deepcopy(), which recursively copies all nested objects. After a deep copy, the original and copy are completely independent:

1import copy
2
3original = {
4 "user": "Alice",
5 "scores": [85, 90, 78]
6}
7
8deep = copy.deepcopy(original)
9deep["scores"].append(95)
10
11print("Original:", original)
12print("Deep copy:", deep)
>>>Output
Original: {'user': 'Alice', 'scores': [85, 90, 78]}
Deep copy: {'user': 'Alice', 'scores': [85, 90, 78, 95]}
Preserve original state
Preserve original state
Deep copy before modifying nested data so you can roll back if needed.
Protect from mutation
Protect from mutation
Pass a deep copy to functions that might change nested lists or dicts.

Performance Considerations

Deep copying is slower and uses more memory than shallow copying because it must traverse and duplicate every nested object. For deeply nested or large data structures, this cost can be significant:
01
= Assignment
Instant, no new memory, both variables point to same data
02
.copy() Shallow
Fast, copies top level only, nested objects are still shared
03
deepcopy() Full
Slow, copies everything recursively, fully independent result
TIP
Don't default to deepcopy() "just in case." Think about whether you actually need independent nested data. Often a shallow copy or even no copy is the right choice.
Python Quiz

> After creating a copy of nested data, you append to the copy's list. Pick the copy function that keeps the original independent, and the variable whose list length is still 2.

import copy
data = {"users": ["Alice", "Bob"]}
snapshot = copy.___(data)
snapshot["users"].append("Carol")
print(len({{data}}["users"]))
copy
snapshot
deepcopy
data
clone

A practical rule of thumb: use assignment when you want shared state, .copy() when you want an independent top-level dict but shared nested objects, and deepcopy() when you need fully independent data at every level.

Deep copying is slower for large nested structures because it traverses and duplicates every object recursively. Profile before defaulting to deepcopy() on performance-critical paths.

When working with configuration objects or templates that get modified per-request, deepcopy() is the right tool. It ensures each caller receives an independent copy that they can customize without affecting others.

defaultdict: Auto-Initializing Keys

Daily Life
Interviews

Eliminate missing-key boilerplate

Remember how we used .setdefault() or .get() to handle missing keys? The collections module provides defaultdict, which automatically creates missing keys with a default value. This eliminates the need for manual checking.

1from collections import defaultdict
2
3# defaultdict auto-creates lists
4groups = defaultdict(list)
5
6groups["fruits"].append("apple")
7groups["fruits"].append("banana")
8groups["vegetables"].append("carrot")
9
10print(dict(groups))
>>>Output
{'fruits': ['apple', 'banana'], 'vegetables': ['carrot']}

The argument to defaultdict is a "factory function" that gets called to create the default value. Common factories include list (for grouping), int (for counting), and set (for unique collections).

Counting with defaultdict

When you use int as the factory, missing keys automatically get the value 0 (since int() returns 0). This makes counting trivially simple:

1from collections import defaultdict
2
3word_counts = defaultdict(int)
4text = "the quick brown fox jumps over the lazy dog"
5
6for word in text.split():
7 # Auto-creates with 0, then adds 1
8 word_counts[word] += 1
9
10print(dict(word_counts))
>>>Output
{'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1}

Compare this to the manual approach with .get(). The defaultdict version is cleaner and less error-prone because you don't have to remember the initialization logic.

Nested defaultdicts

You can create multi-level defaultdicts for building complex nested structures automatically. This is useful when processing hierarchical data:
1from collections import defaultdict
2
3# Two-level: dept -> role -> employees
4org = defaultdict(lambda: defaultdict(list))
5
6org["Engineering"]["Senior"].append("Alice")
7org["Engineering"]["Junior"].append("Bob")
8org["Marketing"]["Manager"].append("Carol")
9
10print(dict(org["Engineering"]))
>>>Output
{'Senior': ['Alice'], 'Junior': ['Bob']}
TIP
The lambda: defaultdict(list) pattern creates a function that returns a new defaultdict. This is needed because defaultdict requires a callable (function), not a value.
Fill in the Blank

> You want to tally letter frequencies using defaultdict with += 1. Pick the factory function whose default value supports addition with integers.

from collections import defaultdict
counts = defaultdict()
counts["a"] += 1
counts["b"] += 1
counts["a"] += 1
print(dict(counts))

defaultdict eliminates the most common boilerplate in Python data processing: the pattern of checking if a key exists before updating it. With the right factory, you can write grouping and counting code that is both shorter and more readable.

The factory function is called with no arguments every time a missing key is accessed. This means you can use any callable that returns your desired default - not just built-ins like int or list, but also lambda functions or custom classes.

When iterating over a defaultdict, convert it to a regular dict first with dict(d) to avoid accidentally creating empty entries for keys that were only checked but never set.

Counter: Purpose-Built for Counting

Daily Life
Interviews

Rank items by frequency instantly

While defaultdict(int) works for counting, Python provides a specialized Counter class that's optimized for this exact use case. Counter has extra features that make counting tasks even easier.

1from collections import Counter
2
3# Count characters in a string
4letter_counts = Counter("mississippi")
5print(letter_counts)
6
7# Count items in a list
8colors = ["red", "blue", "red", "green", "blue", "red"]
9color_counts = Counter(colors)
10print(color_counts)
>>>Output
Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})
Counter({'red': 3, 'blue': 2, 'green': 1})

Counter provides several useful methods that you'd have to implement yourself with a regular dictionary.

The .most_common() Method

Get the N most frequent items with .most_common(), already sorted by count:

1from collections import Counter
2
3words = "the quick brown fox jumps over the lazy dog the fox".split()
4counts = Counter(words)
5
6# Get the 3 most common words
7print(counts.most_common(3))
8
9# Get all items sorted by frequency
10print(counts.most_common())
>>>Output
[('the', 3), ('fox', 2), ('quick', 1)]
[('the', 3), ('fox', 2), ('quick', 1), ('brown', 1), ('jumps', 1), ('over', 1), ('lazy', 1), ('dog', 1)]

Counter Arithmetic

Counters support arithmetic operations. You can add, subtract, or find the intersection/union of counts:
1from collections import Counter
2
3morning = Counter({"coffee": 2, "tea": 1})
4afternoon = Counter({"coffee": 1, "water": 3})
5
6# Add counts together
7total = morning + afternoon
8print("Total:", total)
9
10diff = morning - afternoon
11print("Difference:", diff)
>>>Output
Total: Counter({'coffee': 3, 'water': 3, 'tea': 1})
Difference: Counter({'coffee': 1, 'tea': 1})
Python offers two main tools for counting. Here is how they compare.
defaultdict(int)
  • General-purpose counting
  • Manual iteration needed
  • No special methods
  • Part of collections
Counter
  • Specialized for counting
  • Counts items automatically
  • .most_common(), arithmetic
  • Part of collections

Counter shines in production because of its convenience methods and arithmetic support.

OrderedDict: Stable Order

In Python 3.7 and later, regular dictionaries maintain insertion order. Before that, order was not guaranteed. OrderedDict explicitly guarantees order and provides additional methods for reordering.
1from collections import OrderedDict
2
3od = OrderedDict()
4od["first"] = 1
5od["second"] = 2
6od["third"] = 3
7
8print(list(od.keys()))
>>>Output
['first', 'second', 'third']

While modern regular dicts also maintain order, OrderedDict has a method that regular dicts lack: move_to_end().

move_to_end() Reordering

The move_to_end() method moves a key to either end of the dictionary. This is useful for implementing LRU (Least Recently Used) caches:

1from collections import OrderedDict
2
3od = OrderedDict([("a", 1), ("b", 2), ("c", 3)])
4print("Before:", list(od.keys()))
5
6# Move "a" to the end
7od.move_to_end("a")
8print("After move_to_end:", list(od.keys()))
9
10# Move "c" to the beginning (last=False)
11od.move_to_end("c", last=False)
12print("After move to start:", list(od.keys()))
>>>Output
Before: ['a', 'b', 'c']
After move_to_end: ['b', 'c', 'a']
After move to start: ['c', 'b', 'a']
TIP
In most modern code, regular dicts are sufficient since they preserve order. Use OrderedDict when you need move_to_end() or when writing code that must run on Python 3.6 or earlier.

Dictionary Performance

Daily Life
Interviews

Choose dicts over lists for speed

Understanding dictionary performance is crucial for writing efficient code, especially in data engineering where you process large datasets. Dictionaries use a technique called hashing that makes most operations extremely fast.

How Hashing Works

When you add a key to a dictionary, Python computes a hash value from the key. This hash determines where in memory the value is stored. When you look up a key, Python computes the same hash and jumps directly to that location.
ACCESSADDDELETECHECKITERATE
ACCESS
O(1) Constant
Instant key lookup always
ADD
O(1) Constant
Insert new pair instantly
DELETE
O(1) Constant
Remove any key instantly
CHECK
O(1) Constant
"in" checks key in O(1)
ITERATE
O(n) Linear
Must visit every element

O(1) means the operation takes the same amount of time whether your dictionary has 10 items or 10 million items. This is incredibly powerful. A list would require O(n) time to find an item, meaning it gets slower as the list grows.

dict vs list Lookups

When you need to check if an item exists in a collection, dictionaries vastly outperform lists for large datasets:
1# Checking membership: list vs dict
2large_list = list(range(1000000))
3large_dict = {i: True for i in range(1000000)}
4
5# Both lines check if 999999 exists
6# List: must scan up to 1 million items (slow)
7# Dict: computes hash and jumps directly (fast)
8print(999999 in large_list)
9print(999999 in large_dict)
>>>Output
True
True
item in list
  • O(n) - slower as list grows
  • Must check every element
  • 1M items = up to 1M checks
  • Avoid for large datasets
key in dict
  • O(1) - constant speed
  • Hash calculation + jump
  • 1M items = still instant
  • Ideal for lookups
This performance gap has real consequences in production systems.

Memory Considerations

Dictionaries trade memory for speed. They preallocate extra space to maintain fast operations as items are added. Understanding this helps you make informed decisions about data structure choices.

Dictionary Overhead

A dictionary uses more memory than an equivalent list or tuple because it stores both keys and values, plus additional hash table overhead:
1import sys
2
3# Compare memory usage
4list_data = [1, 2, 3, 4, 5]
5dict_data = {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}
6
7print("List size:", sys.getsizeof(list_data), "bytes")
8print("Dict size:", sys.getsizeof(dict_data), "bytes")
>>>Output
List size: 104 bytes
Dict size: 232 bytes
For small amounts of data, this overhead is negligible. For millions of items, it can matter. However, the speed benefit usually outweighs the memory cost.
TIP
If you have millions of items with integer keys from 0 to n, a list might be more memory-efficient. If you have sparse keys or string keys, a dictionary is the right choice.

Advanced Dictionary Patterns

Daily Life
Interviews

Solve two-sum and cache with dicts

These patterns appear frequently in technical interviews and production code. Each demonstrates a clever way to use dictionaries to solve problems that would otherwise require slower, more complex approaches.

Two-Sum Pattern

One of the most famous interview problems is finding two numbers in a list that add up to a target. The optimal solution uses a dictionary to achieve O(n) time:

1def two_sum(nums, target):
2 seen = {}
3 for i, num in enumerate(nums):
4 complement = target - num
5 if complement in seen:
6 return [seen[complement], i]
7 seen[num] = i
8 return None
9
10numbers = [2, 7, 11, 15]
11result = two_sum(numbers, 9)
12print(result)
>>>Output
[0, 1]

This pattern appears in many variations. The key insight is using the dictionary to "remember" what you've seen, turning a potential O(n²) nested loop into O(n).

Caching with Dictionaries

Memoization is a technique where you cache function results to avoid redundant computation. Dictionaries are perfect for this:
1# Naive Fibonacci is very slow for large n
2# Memoized version is fast
3
4cache = {}
5
6def fib(n):
7 if n in cache:
8 return cache[n]
9 if n <= 1:
10 return n
11 result = fib(n-1) + fib(n-2)
12 cache[n] = result
13 return result
14
15print(fib(30))
>>>Output
832040
TIP
Python's functools module provides the @lru_cache decorator that does this automatically. But understanding the dictionary-based approach helps you implement custom caching strategies.

Graph Representation

Dictionaries are the standard way to represent graphs in Python. Each key is a node, and the value is a list of connected nodes:
1# Adjacency list representation of a graph
2graph = {
3 "A": ["B", "C"],
4 "B": ["A", "D"],
5 "C": ["A", "D"],
6 "D": ["B", "C"]
7}
8
9# Find all neighbors of node "A"
10print("Neighbors of A:", graph["A"])
11
12# Check if B and D are connected
13print("B connects to D:", "D" in graph["B"])
>>>Output
Neighbors of A: ['B', 'C']
B connects to D: True

State Machines

Dictionaries can elegantly represent state transitions. The key is the current state, and the value is another dictionary mapping inputs to next states:
1# Simple traffic light state machine
2transitions = {
3 "green": {"timer": "yellow"},
4 "yellow": {"timer": "red"},
5 "red": {"timer": "green"}
6}
7
8current = "green"
9for _ in range(4):
10 print(f"Light is {current}")
11 current = transitions[current]["timer"]
>>>Output
Light is green
Light is yellow
Light is red
Light is green

Dictionaries and JSON

Daily Life
Interviews

Convert between dicts and JSON

JSON (JavaScript Object Notation) is the most common data format for APIs and configuration files. Python dictionaries map directly to JSON objects, making conversion seamless:
1import json
2
3# Dictionary to JSON string
4user = {"name": "Alice", "age": 28, "active": True}
5json_str = json.dumps(user)
6print("JSON:", json_str)
7
8# JSON string back to dictionary
9parsed = json.loads(json_str)
10print("Dict:", parsed)
11print("Type:", type(parsed))
>>>Output
JSON: {"name": "Alice", "age": 28, "active": true}
Dict: {'name': 'Alice', 'age': 28, 'active': True}
Type: <class 'dict'>
dumps()loads()dump()load()
dumps()
Dict to string
Serialize dict to JSON
loads()
String to dict
Parse JSON into a dict
dump()
Dict to file
Write dict as JSON to file
load()
File to dict
Read JSON file into dict
In data engineering, you'll constantly convert between dictionaries and JSON when working with APIs, configuration files, and data pipelines.
Python Quiz

> Convert a dictionary to a JSON string and parse it back. The "s" suffix distinguishes string operations from file operations.

import json
user = {"name": "Alice", "age": 28}
text = json.___(user)
back = json.___(text)
print(type(back))
load
loads
dumps
dump

JSON and Python dictionaries map almost perfectly to each other. JSON objects become Python dicts, JSON arrays become Python lists, JSON booleans become Python True/False, and JSON null becomes Python None.

One key difference: JSON keys must always be strings. When you use json.dumps() on a dict with integer keys, Python will convert them to strings automatically, which may change the structure when you parse the result back.

Use json.dumps(data, indent=2) to produce human-readable JSON with indentation. This is invaluable for debugging API responses and configuration files during development.

Dictionary Gotchas

Daily Life
Interviews

Avoid mutation and iteration traps

Even experienced developers make these mistakes. Being aware of these gotchas helps you avoid subtle bugs.

Modifying During Iteration

Adding or removing keys while iterating causes a RuntimeError. If you need to modify, iterate over a copy:
1data = {"a": 1, "b": 2, "c": 3}
2
3# WRONG: raises RuntimeError
4# for key in data:
5# if data[key] < 2:
6# del data[key]
7
8# CORRECT: Iterate over a copy
9for key in list(data.keys()):
10 if data[key] < 2:
11 del data[key]
12
13print(data)
>>>Output
{'b': 2, 'c': 3}

Mutable Default Arguments

Using a dictionary as a default argument is a famous Python pitfall. The default is created once and shared across all calls:
1# WRONG: Mutable default is shared!
2def add_item_wrong(item, bag={}):
3 bag[item] = True
4 return bag
5
6print(add_item_wrong("apple"))
7# Both items appear due to shared mutable default!
8print(add_item_wrong("banana"))
9
10# CORRECT: Use None as default
11def add_item_right(item, bag=None):
12 if bag is None:
13 bag = {}
14 bag[item] = True
15 return bag
16
17print(add_item_right("apple"))
18print(add_item_right("banana"))
>>>Output
{'apple': True}
{'apple': True, 'banana': True}
{'apple': True}
{'banana': True}
Do
  • Use None as default for dict/list arguments
  • Create a new dict inside the function body
  • Document when a function mutates its input
Don't
  • Use {} or [] as default argument values
  • Assume each call gets a fresh default
  • Share mutable state between function calls

Integer Key Confusion

In Python, the integer 1 and the boolean True hash to the same value. This can lead to unexpected overwrites:
1d = {True: "boolean", 1: "integer"}
2print(d)
3print(len(d))
4
5# Same with False and 0
6d2 = {False: "boolean", 0: "integer"}
7print(d2)
>>>Output
{True: 'integer'}
1
{False: 'integer'}
TIP
This rarely comes up in practice because you wouldn't normally mix booleans and integers as keys. But it's a fun fact that explains some confusing behavior.
This loop tries to remove small values from a dictionary, but it crashes. Fix the bug by removing the tile that causes the runtime error.
Debug Challenge

> This loop deletes keys from a dictionary while iterating over it, which causes a RuntimeError because the dictionary size changes mid-iteration.

RuntimeError: dictionary changed size during iteration

Dictionaries power many architectural decisions in real systems. Practice choosing the right dictionary-based pattern for a data engineering scenario.

Iterating over a copy of the keys with list(data.keys()) or list(data) is the safest pattern when you need to modify a dictionary during a loop. It is explicit, readable, and avoids the RuntimeError entirely.

Mutable default arguments are one of Python's most notorious gotchas. The rule is simple: never use a dict, list, or set as a default parameter value. Always use None and create the mutable object inside the function.

Lookup Table ArchitectureStep 1
>

You are building a Python service that enriches incoming event records with user profile data. Each event contains a user_id, and you need to look up the user's name, plan tier, and region before forwarding the enriched record downstream. The system processes 50,000 events per minute.

incoming_events
event_iduser_idactiontimestamp
e_001u_42page_view2024-06-01T10:00:00
e_002u_99purchase2024-06-01T10:00:01
e_003u_42click2024-06-01T10:00:02
May 2026

You need to look up user profiles for each event. The user table has 500,000 rows. How do you structure the lookup?

Advanced dictionary techniques unlock powerful patterns for data transformation and configuration management. Put your skills to the test with hands-on challenges in the Python Builder.

The two-sum pattern - using a dictionary to remember previously seen values - is one of the most broadly applicable interview algorithms. Any problem that asks "have I seen this complement before?" can be solved with a dictionary in O(n) instead of O(n²) with nested loops.

State machines represented as nested dictionaries are easy to extend and test. Adding a new state or transition only requires adding one entry to the dictionary, with no changes to the traversal logic.
PUTTING IT ALL TOGETHER

> You are a senior data engineer at Cloudflare building an in-memory lookup cache for user segment data: you deep-copy nested baseline configs to prevent mutation, use defaultdict to accumulate request counts per segment, apply Counter to rank the top segments, and rely on O(1) dictionary access to serve thousands of model scoring requests per second.

copy.deepcopy() creates a truly independent copy of the nested baseline config so cache warmup never corrupts the source template.
defaultdict(int) auto-initializes missing segment keys to zero, eliminating guard checks when incrementing request counters per segment.
Counter ranks segments by request frequency with .most_common(), directly driving which segments receive priority cache slots.
O(1) dictionary lookups allow the scoring service to retrieve any segment's feature data in constant time regardless of cache size.
KEY TAKEAWAYS
Use copy.deepcopy() for truly independent copies of nested data
defaultdict auto-creates missing keys with a default value
Counter is optimized for counting with .most_common() and arithmetic
Dictionary operations (access, add, delete) are O(1) constant time
Convert lists to dicts when you need fast membership testing
json.dumps() and json.loads() convert between dicts and JSON
Never modify a dict while iterating; use list(dict.keys())
Use None as default for mutable function arguments

Dictionary mastery for production code

Category
Python
Difficulty
advanced
Duration
30 minutes
Challenges
3 hands-on challenges

Topics covered: Deep Copying Dictionaries, defaultdict: Auto-Initializing Keys, Counter: Purpose-Built for Counting, Dictionary Performance, Advanced Dictionary Patterns, Dictionaries and JSON, Dictionary Gotchas

Lesson Sections

  1. Deep Copying Dictionaries

    The problem is that both dictionaries share the same list object. Appending to the list affects both because they're pointing to the same list in memory. The copy Module Performance Considerations Deep copying is slower and uses more memory than shallow copying because it must traverse and duplicate every nested object. For deeply nested or large data structures, this cost can be significant:

  2. defaultdict: Auto-Initializing Keys (concepts: pyCollections)

    Counting with defaultdict Nested defaultdicts You can create multi-level defaultdicts for building complex nested structures automatically. This is useful when processing hierarchical data:

  3. Counter: Purpose-Built for Counting (concepts: pyFrequencyCount)

    The .most_common() Method Counter Arithmetic Counters support arithmetic operations. You can add, subtract, or find the intersection/union of counts: Python offers two main tools for counting. Here is how they compare. OrderedDict: Stable Order In Python 3.7 and later, regular dictionaries maintain insertion order. Before that, order was not guaranteed. OrderedDict explicitly guarantees order and provides additional methods for reordering. move_to_end() Reordering

  4. Dictionary Performance

    Understanding dictionary performance is crucial for writing efficient code, especially in data engineering where you process large datasets. Dictionaries use a technique called hashing that makes most operations extremely fast. How Hashing Works When you add a key to a dictionary, Python computes a hash value from the key. This hash determines where in memory the value is stored. When you look up a key, Python computes the same hash and jumps directly to that location. dict vs list Lookups When

  5. Advanced Dictionary Patterns

    These patterns appear frequently in technical interviews and production code. Each demonstrates a clever way to use dictionaries to solve problems that would otherwise require slower, more complex approaches. Two-Sum Pattern Caching with Dictionaries Memoization is a technique where you cache function results to avoid redundant computation. Dictionaries are perfect for this: Graph Representation Dictionaries are the standard way to represent graphs in Python. Each key is a node, and the value is

  6. Dictionaries and JSON

    JSON (JavaScript Object Notation) is the most common data format for APIs and configuration files. Python dictionaries map directly to JSON objects, making conversion seamless: In data engineering, you'll constantly convert between dictionaries and JSON when working with APIs, configuration files, and data pipelines.

  7. Dictionary Gotchas

    Even experienced developers make these mistakes. Being aware of these gotchas helps you avoid subtle bugs. Modifying During Iteration Adding or removing keys while iterating causes a RuntimeError. If you need to modify, iterate over a copy: Mutable Default Arguments Using a dictionary as a default argument is a famous Python pitfall. The default is created once and shared across all calls: Integer Key Confusion In Python, the integer 1 and the boolean True hash to the same value. This can lead t