Amazon DynamoDB and Apache Cassandra use Bloom filters to avoid expensive disk reads when a key almost certainly does not exist, cutting database lookup costs by orders of magnitude at petabyte scale. A Bloom filter is a probabilistic set structure: it can tell you with certainty that a key is absent, or with high probability that it is present, using a tiny fraction of the memory a full set would require. This lesson covers the internal mechanics of Python sets, including hashing and load factors, and builds up to Bloom filters and other probabilistic data structures that systems engineers use when exact set membership is too costly at scale.
Frozensets: Immutable Sets
Daily Life
Interviews
Use frozensets as keys and elements
A frozenset is an immutable version of a set. Once created, a frozenset cannot be modified: you cannot add or remove elements. This immutability might seem like a limitation, but it has an important consequence that unlocks powerful patterns: frozensets are hashable, meaning they can be used as dictionary keys or as elements of other sets.
To understand why immutability enables hashing, recall that hash tables (the underlying data structure for both dictionaries and sets) rely on objects having stable hash values. If you could change a set after using it as a dictionary key, its hash value would change, and the dictionary would no longer be able to find it. Python prevents this by making regular sets unhashable. Frozensets solve this by guaranteeing that the content never changes.
01
Hash-based storage
Hash tables store objects by their hash value for fast retrieval
02
Silent failures
If a hash changes after storage, lookups silently fail to find it
03
Frozen = constant
Frozensets guarantee immutability, so hash values never change
04
Immutable = hashable
Strings, numbers, and tuples are hashable because they cannot change
05
Mutable = rejected
Lists and regular sets are not hashable because they can be modified
1
# Create a frozenset from a regular set or any iterable
2
colors=frozenset({"red","green","blue"})
3
print("Frozenset:",colors)
4
print("Type:",type(colors))
5
6
# Create from a list
7
numbers=frozenset([1,2,3,2,1])
8
print("From list:",numbers)
9
10
# Frozensets support all read operations
11
print("Contains red?","red"incolors)
12
print("Length:",len(colors))
>>>Output
Frozenset: frozenset({'red', 'green', 'blue'})
Type: <class 'frozenset'>
From list: frozenset({1, 2, 3})
Contains red? True
Length: 3
Frozensets support all the same read operations as regular sets: membership testing with the in operator, getting the length with len(), iteration with for loops, and all set operations like union, intersection, difference, and symmetric difference. The only difference is that you cannot modify a frozenset after creation.
Frozensets Are Immutable
Attempting to add or remove elements from a frozenset raises an AttributeError. This is because frozensets simply do not have the add(), remove(), discard(), pop(), or clear() methods that regular sets have. This is by design, not a bug.
1
colors=frozenset({"red","green","blue"})
2
3
# These would all raise AttributeError:
4
# colors.add("yellow") # AttributeError
5
# colors.remove("red") # AttributeError
6
# colors.discard("green") # AttributeError
7
# colors.clear() # AttributeError
8
# colors |= {"yellow"} # TypeError
9
10
# Set operations work, but return NEW frozensets
11
more_colors=colors|frozenset({"yellow","purple"})
12
print("New frozenset:",more_colors)
13
print("Original unchanged:",colors)
Set operations like union, intersection, and difference work on frozensets, but they return new frozensets rather than modifying the original. This follows the same pattern as string operations: strings are immutable, so string methods return new strings rather than modifying in place.
Set Ops Return Same Type
When performing set operations, the type of the result depends on the first operand. If the first operand is a frozenset, the result is a frozenset. If it is a regular set, the result is a regular set. This behavior is consistent and predictable.
1
frozen=frozenset({1,2,3})
2
regular={3,4,5}
3
4
# First operand determines result type
5
# frozenset as first operand -> result is frozenset
Because frozensets are hashable, they can serve as dictionary keys. This is incredibly powerful when you need to map combinations of items to values. The order of elements does not matter (since sets are unordered), and duplicate elements in the key are automatically handled.
This pattern is powerful for many applications: mapping feature combinations to outcomes in machine learning, caching results of functions that take sets as arguments, or implementing graph algorithms where nodes are identified by sets of attributes.
Frozensets as Set Elements
You cannot put a regular set inside another set because regular sets are not hashable. But you can put frozensets inside sets, enabling you to represent "sets of sets". This is useful for grouping, clustering, partitioning, or representing complex mathematical relationships.
1
# A set of groups (each group is a frozenset)
2
project_teams={
3
frozenset({"alice","bob"}),
4
frozenset({"charlie","diana","eve"}),
5
frozenset({"alice","bob"}),
6
frozenset({"frank"})
7
}
8
9
print("Number of unique teams:",len(project_teams))
The duplicate team (alice, bob) is automatically eliminated because sets only keep unique elements. This makes frozensets ideal for representing partitions, equivalence classes, or any situation where you need to track unique groups of items.
•Regular Set
Mutable: can add/remove elements
Not hashable
Cannot be a dictionary key
Cannot be an element of another set
Use when you need to modify
•Frozenset
Immutable: content fixed at creation
Hashable
Can be a dictionary key
Can be a set element
Use for composite keys/identities
Python Quiz
> A frozenset is created from a list and used as a dictionary key. Pick the correct constructor to make the set immutable and hashable, and the correct method to check how many elements remain after deduplication.
You can freely convert between sets and frozensets. Converting a set to a frozenset "freezes" its current contents. Converting a frozenset to a set creates a mutable copy that you can then modify.
Original still frozen: frozenset({'item1', 'item2', 'item3'})
TIP
A common pattern is to build up a set incrementally, then freeze it when complete. This gives you the flexibility of mutable sets during construction and the safety and hashability of frozensets for storage.
Set Comprehensions
Daily Life
Interviews
Build sets with comprehension syntax
Set comprehensions provide a concise, expressive way to create sets from iterables with optional filtering and transformation. If you are familiar with list comprehensions, set comprehensions follow the exact same syntax but use curly braces instead of square brackets. The result is a set with all duplicates automatically removed.
The general syntax is {expression for item in iterable if condition}. The expression is evaluated for each item that passes the optional condition, and all unique results are collected into a set. This is often more readable and efficient than building a set with explicit loops.
1
squares={x*xforxinrange(1,6)}
2
print("Squares:",squares)
3
4
# Compare to the equivalent loop
5
squares_loop=set()
6
forxinrange(1,6):
7
squares_loop.add(x*x)
8
print("From loop:",squares_loop)
9
10
>>>Output
Squares: {1, 4, 9, 16, 25}
From loop: {1, 4, 9, 16, 25}
The comprehension accomplishes the same result in one readable line. Both produce identical sets of squares from 1 to 25. The comprehension form is generally preferred in Python because it is more concise and clearly expresses intent.
Filtering with Conditions
Add an if clause to filter which elements are included in the resulting set. Only items for which the condition evaluates to True contribute to the result. This is equivalent to wrapping the loop body in an if statement.
1
# Only include even numbers
2
evens={xforxinrange(1,21)ifx%2==0}
3
print("Evens:",evens)
4
5
# Only include positive values
6
values=[-3,-1,0,1,2,3,-2,1,3]
7
positives={xforxinvaluesifx>0}
8
print("Positive values:",positives)
9
10
# Extract unique words that are at least 5 characters
Notice in the second example how duplicates (1 and 3 appear twice in the input) are automatically handled. The third example filters the word list and naturally deduplicates, though in this case there were no duplicate long words.
Transforming Elements
The expression before the for keyword can be any transformation of the loop variable. This allows you to extract, convert, or compute derived values in a single expression.
1
names=["Alice","BOB","charlie","DIANA","alice"]
2
3
# Normalize to lowercase (automatically handles duplicates)
In the first example, "Alice" and "alice" both normalize to "alice", so only four unique names remain. The domain extraction example naturally deduplicates: even though gmail.com appears twice in the input, it appears only once in the result.
Transforming and Filtering
You can combine transformation expressions with filter conditions to create powerful one-liners. The condition is evaluated before the transformation, so you can filter based on the original values while transforming the output.
1
# Get uppercase versions of names longer than 4 characters
You can use multiple for clauses to iterate over nested structures. This is useful for flattening lists of lists, processing matrices, or generating combinations. The for clauses are processed left to right, like nested loops.
1
# Flatten a matrix and get unique values
2
matrix=[
3
[1,2,3],
4
[2,3,4],
5
[3,4,5]
6
]
7
8
unique_values={valueforrowinmatrixforvalueinrow}
9
print("Unique values:",unique_values)
10
11
# Generate all unique sums from two lists
12
list1=[1,2,3]
13
list2=[10,20]
14
sums={a+bforainlist1forbinlist2}
15
print("Unique sums:",sums)
>>>Output
Unique values: {1, 2, 3, 4, 5}
Unique sums: {11, 12, 13, 21, 22, 23}
The matrix has nine total elements but only five unique values. The nested comprehension flattens the structure and eliminates duplicates in one expression. The sums example shows how to generate all combinations from two sequences.
Using Set Comprehensions
Set comprehensions are ideal when you need unique values from a transformation. If duplicates might exist in your input or result, and you want each value only once, a set comprehension is cleaner than a list comprehension followed by a set() conversion.
1
# Scenario: Extract unique tags from blog posts
2
posts=[
3
{"title":"Intro to Python","tags":["python","beginner"]},
Same result: {'python', 'beginner', 'advanced', 'data', 'science'}
TIP
If you do not need uniqueness or if preserving order matters, use a list comprehension instead. Set comprehensions automatically deduplicate and do not preserve insertion order (though iteration order is stable in Python 3.7+).
Try switching between set and list comprehension syntax to see how the bracket type changes the result.
Fill in the Blank
> A list [1, 2, 2, 3, 3, 3] has duplicates and you want to double each value. Pick opening and closing brackets to create either a set or list comprehension and see how duplicates are handled.
nums = [1, 2, 2, 3, 3, 3]
result = x * 2 for x in nums
print(type(result), result)
The bracket type is the only syntactic difference between a set comprehension and a list comprehension. Curly braces {} produce a set with automatic deduplication; square brackets [] produce a list that preserves every element including duplicates.
Set comprehensions are especially useful when collecting unique tags, categories, or identifiers from a nested data structure, since any duplicate values are silently discarded during construction.
TIP
If you need both uniqueness and ordering, build a list comprehension first, then convert: list(dict.fromkeys(items)). This preserves insertion order while removing duplicates, unlike a set which does not guarantee order.
Performance: When to Use Sets
Daily Life
Interviews
Choose sets vs lists for performance
Choosing between sets and lists affects both correctness and performance. Understanding the time complexity of common operations helps you make informed decisions. The wrong choice can turn an efficient algorithm into a painfully slow one.
x in coll.add(x).remove(x)for x incoll[i]
x in coll
Membership
Set O(1) vs List O(n)
.add(x)
Add element
Set O(1) vs List O(1)
.remove(x)
Remove element
Set O(1) vs List O(n)
for x in
Iteration
Both are O(n) linear
coll[i]
Index access
Set N/A vs List O(1)
The critical difference is membership testing. For a list with 1 million elements, checking if an item exists requires scanning up to 1 million elements in the worst case. For a set, the same check is nearly instant regardless of size because sets use hash tables internally.
Membership Testing Compared
The in operator behaves very differently for sets versus lists. This difference becomes dramatic as collection size grows.
1
# Demonstrate the difference conceptually
2
# (actual timing would require larger data)
3
4
small_list=[1,2,3,4,5]
5
small_set={1,2,3,4,5}
6
7
# Both are fast for small collections
8
print("5 in list:",5insmall_list)
9
print("5 in set:",5insmall_set)
10
11
# For large collections, the difference is huge:
12
# List with 1 million elements: up to 1 million comparisons
13
# Set with 1 million elements: ~1-3 hash comparisons
14
15
# This matters when checking many items:
16
items_to_check=[1,999999,500000,3,7]
17
# Against a list: 5 * 500000 avg = 2.5 million ops
18
# Against a set: 5 * 1 = 5 ops
>>>Output
5 in list: True
5 in set: True
When to Prefer Lists
Sets are not always the right choice. Lists have advantages that make them better for certain use cases. Understanding when to use each data structure is essential for writing good Python code.
Order matters
Lists preserve insertion order; sets do not guarantee it
Duplicates are meaningful
Lists keep all occurrences; sets eliminate duplicates
Index access needed
list[0], list[-1], list[5:10] all require ordered access
Unhashable elements
Lists can hold other lists; sets cannot hold mutable types
Append/pop from ends
Lists are optimized for stack and queue style operations
When to Prefer Sets
Sets excel in scenarios where their unique properties provide clear benefits over lists.
Uniqueness required
Duplicates should be automatically eliminated on insertion
Frequent membership tests
Checking "is X in the collection?" happens often in your code
Set operations needed
Union, intersection, difference operations are central to the task
Order does not matter
No need for indexing or position-based access to elements
Large collection lookups
Performance gains from O(1) lookup are significant at scale
Hybrid Approach: List + Set
A common optimization is to maintain both a list (for ordering and indexed access) and a set (for fast lookup). This gives you the best of both worlds at the cost of extra memory.
Final ordered items: ['apple', 'banana', 'cherry']
Contains 'banana'? True
TIP
Python 3.7+ dicts preserve insertion order and have O(1) lookup. For ordered unique collections, dict.fromkeys(items) is often simpler than maintaining parallel list/set structures.
Python Quiz
> You have a list of allowed IDs and need to validate incoming requests efficiently. Pick the correct conversion to enable O(1) lookups, and the right operator to check membership.
Even experienced developers encounter subtle issues with sets. Understanding these pitfalls helps you write robust code and debug problems quickly.
Pitfall 1: Loop Mutation
Never modify a set while iterating over it. Adding or removing elements during iteration causes undefined behavior, typically raising a RuntimeError. This applies to all mutable collections in Python.
1
# WRONG: This raises RuntimeError
2
numbers={1,2,3,4,5}
3
forninnumbers:
4
ifn%2==0:
5
# RuntimeError: Set changed size during iteration
6
numbers.remove(n)
7
8
# CORRECT: Iterate over a copy
9
numbers={1,2,3,4,5}
10
# .copy() creates a snapshot to iterate safely
11
forninnumbers.copy():
12
ifn%2==0:
13
numbers.remove(n)
14
print(numbers)
15
16
# ALTERNATIVE: Use set comprehension to create new set
17
numbers={1,2,3,4,5}
18
odds={nforninnumbersifn%2!=0}
19
print(odds)
The comprehension approach is often cleaner because it creates a new set with the desired elements rather than modifying in place. Use .copy() when you specifically need to modify the original set.
Pitfall 2: Sets vs Lists
A set and a list with identical elements are never equal in Python. They are different types, and equality comparison returns False. This catches many developers by surprise.
1
my_list=[1,2,3]
2
my_set={1,2,3}
3
4
# Direct comparison: always False (different types)
Note that sets and frozensets CAN be equal because they are both set types. Only comparisons between different collection families (list/set, tuple/set, etc.) return False.
Pitfall 3: Set Order
Although CPython 3.7+ maintains insertion order internally for dict and has stable iteration order for sets, sets are formally unordered collections. You should never write code that depends on set iteration order, as it is an implementation detail that could change.
1
# Order appears consistent but is NOT guaranteed
2
colors={"red","green","blue"}
3
4
# This works but is bad practice
5
first=next(iter(colors))
6
print(f"Got: {first}")
7
8
# If you need consistent ordering, use a list
9
colors_ordered=["red","green","blue"]
10
print(f"First: {colors_ordered[0]}")
11
12
# Or use sorted() for alphabetical order
13
print(f"Sorted first: {sorted(colors)[0]}")
>>>Output
Got: red
First: red
Sorted first: blue
Pitfall 4: set() vs {}
The curly brace syntax {} creates an empty dictionary, not an empty set. To create an empty set, you must use set(). This is a common source of confusion and bugs.
1
# This is a DICTIONARY, not a set!
2
empty_braces={}
3
print("type({}):",type(empty_braces))
4
5
# This is how you create an empty set
6
empty_set=set()
7
print("type(set()):",type(empty_set))
8
9
# Non-empty sets use curly braces
10
non_empty_set={1,2,3}
11
print("type({1,2,3}):",type(non_empty_set))
12
13
>>>Output
type({}): <class 'dict'>
type(set()): <class 'set'>
type({1,2,3}): <class 'set'>
The code below tries to build a frozenset for use as a dictionary key, but has a subtle bug. Can you fix it?
Debug Challenge
> This code wraps a list in set() and then frozenset(), but the intermediate set() call is redundant since frozenset() already accepts a list.
Redundant double-wrapping: set() inside frozenset() is unnecessary
When creating custom classes for use in sets, you must properly implement both __hash__ and __eq__. If two objects are equal (according to __eq__), they must have the same hash value. Violating this contract causes unpredictable behavior.
# Same user ID = same user, even with different names
19
users={
20
User(1,"Alice"),
21
User(2,"Bob"),
22
User(1,"Alicia")
23
}
24
25
print("Number of users:",len(users))
26
print("Users:",users)
The set contains only 2 users because User(1, "Alice") and User(1, "Alicia") are considered equal (same user_id). The one added later replaces the earlier one in the set.
Advanced set concepts connect to deeper ideas about hashing, equality, and data integrity. Put your skills to the test with hands-on challenges in the Python Builder.
❯❯❯PUTTING IT ALL TOGETHER
> You are a senior data engineer at The Trade Desk building a high-throughput audience membership system for real-time bidding: you use frozensets as dictionary keys to represent immutable audience segment combos, set comprehensions to filter qualifying user IDs, and O(1) set lookups to check segment membership in under a millisecond.
frozenset makes each audience segment combination hashable so it can serve as a dictionary key mapping to its bid price.
{x for x in ...} set comprehension filters the raw user ID stream down to qualifying bidder IDs in a single concise expression.
O(1) set membership testing checks whether a user belongs to a target segment in constant time regardless of total audience size.
Choosing sets over lists for the membership index avoids the O(n) linear scan that would exceed the sub-millisecond bid deadline.
KEY TAKEAWAYS
frozenset is immutable and hashable; can be dict keys or set elements
Set operations on frozensets return the type of the first operand
Set comprehensions: {expression for item in iterable if condition}
Nested comprehensions flatten structures: {x for row in matrix for x in row}
Deduplication: set(items) or list(dict.fromkeys(items)) (preserves order)
Sets have O(1) membership test; lists have O(n)
Use sets for validation, cohort analysis, tag matching, and filtering
Never modify a set while iterating; use .copy() or comprehensions
Sets and lists are never equal ([1,2] != {1,2})
{} creates an empty dict, not a set; use set()
Custom classes need consistent __hash__ and __eq__
Frozensets, comprehensions, patterns
Category
Python
Difficulty
advanced
Duration
34 minutes
Challenges
3 hands-on challenges
Topics covered: Frozensets: Immutable Sets, Set Comprehensions, Performance: When to Use Sets
A frozenset is an immutable version of a set. Once created, a frozenset cannot be modified: you cannot add or remove elements. This immutability might seem like a limitation, but it has an important consequence that unlocks powerful patterns: frozensets are hashable, meaning they can be used as dictionary keys or as elements of other sets. To understand why immutability enables hashing, recall that hash tables (the underlying data structure for both dictionaries and sets) rely on objects having
Set comprehensions provide a concise, expressive way to create sets from iterables with optional filtering and transformation. If you are familiar with list comprehensions, set comprehensions follow the exact same syntax but use curly braces instead of square brackets. The result is a set with all duplicates automatically removed. The comprehension accomplishes the same result in one readable line. Both produce identical sets of squares from 1 to 25. The comprehension form is generally preferred
Choosing between sets and lists affects both correctness and performance. Understanding the time complexity of common operations helps you make informed decisions. The wrong choice can turn an efficient algorithm into a painfully slow one. The critical difference is membership testing. For a list with 1 million elements, checking if an item exists requires scanning up to 1 million elements in the worst case. For a set, the same check is nearly instant regardless of size because sets use hash tab