Sets: Advanced

Amazon DynamoDB and Apache Cassandra use Bloom filters to avoid expensive disk reads when a key almost certainly does not exist, cutting database lookup costs by orders of magnitude at petabyte scale. A Bloom filter is a probabilistic set structure: it can tell you with certainty that a key is absent, or with high probability that it is present, using a tiny fraction of the memory a full set would require. This lesson covers the internal mechanics of Python sets, including hashing and load factors, and builds up to Bloom filters and other probabilistic data structures that systems engineers use when exact set membership is too costly at scale.

Frozensets: Immutable Sets

Daily Life

Interviews

Use frozensets as keys and elements

A frozenset is an immutable version of a set. Once created, a frozenset cannot be modified: you cannot add or remove elements. This immutability might seem like a limitation, but it has an important consequence that unlocks powerful patterns: frozensets are hashable, meaning they can be used as dictionary keys or as elements of other sets.

To understand why immutability enables hashing, recall that hash tables (the underlying data structure for both dictionaries and sets) rely on objects having stable hash values. If you could change a set after using it as a dictionary key, its hash value would change, and the dictionary would no longer be able to find it. Python prevents this by making regular sets unhashable. Frozensets solve this by guaranteeing that the content never changes.

Hash-based storage

Hash tables store objects by their hash value for fast retrieval

Silent failures

If a hash changes after storage, lookups silently fail to find it

Frozen = constant

Frozensets guarantee immutability, so hash values never change

Immutable = hashable

Strings, numbers, and tuples are hashable because they cannot change

Mutable = rejected

Lists and regular sets are not hashable because they can be modified

1	# Create a frozenset from a regular set or any iterable
2	colors = frozenset({"red", "green", "blue"})
3	print("Frozenset:", colors)
4	print("Type:", type(colors))
5
6	# Create from a list
7	numbers = frozenset([1, 2, 3, 2, 1])
8	print("From list:", numbers)
9
10	# Frozensets support all read operations
11	print("Contains red?", "red" in colors)
12	print("Length:", len(colors))

>>>Output

Frozenset: frozenset({'red', 'green', 'blue'})

Type: <class 'frozenset'>

From list: frozenset({1, 2, 3})

Contains red? True

Length: 3

Frozensets support all the same read operations as regular sets: membership testing with the in operator, getting the length with len(), iteration with for loops, and all set operations like union, intersection, difference, and symmetric difference. The only difference is that you cannot modify a frozenset after creation.

Frozensets Are Immutable

Attempting to add or remove elements from a frozenset raises an AttributeError. This is because frozensets simply do not have the add(), remove(), discard(), pop(), or clear() methods that regular sets have. This is by design, not a bug.

1	colors = frozenset({"red", "green", "blue"})
2
3	# These would all raise AttributeError:
4	# colors.add("yellow") # AttributeError
5	# colors.remove("red") # AttributeError
6	# colors.discard("green") # AttributeError
7	# colors.clear() # AttributeError
8	# colors \|= {"yellow"} # TypeError
9
10	# Set operations work, but return NEW frozensets
11	more_colors = colors \| frozenset({"yellow", "purple"})
12	print("New frozenset:", more_colors)
13	print("Original unchanged:", colors)

Set operations like union, intersection, and difference work on frozensets, but they return new frozensets rather than modifying the original. This follows the same pattern as string operations: strings are immutable, so string methods return new strings rather than modifying in place.

Set Ops Return Same Type

When performing set operations, the type of the result depends on the first operand. If the first operand is a frozenset, the result is a frozenset. If it is a regular set, the result is a regular set. This behavior is consistent and predictable.

1	frozen = frozenset({1, 2, 3})
2	regular = {3, 4, 5}
3
4	# First operand determines result type
5	# frozenset as first operand -> result is frozenset
6	result1 = frozen \| regular
7	# set as first operand -> result is set
8	result2 = regular \| frozen
9
10	print("frozen \| regular:", type(result1).__name__, result1)
11	print("regular \| frozen:", type(result2).__name__, result2)

>>>Output

frozen | regular: frozenset frozenset({1, 2, 3, 4, 5})

regular | frozen: set {1, 2, 3, 4, 5}

Frozensets as Dict Keys

Because frozensets are hashable, they can serve as dictionary keys. This is incredibly powerful when you need to map combinations of items to values. The order of elements does not matter (since sets are unordered), and duplicate elements in the key are automatically handled.

1	# Map ingredient combinations to recipe names
2	recipes = {
3	frozenset({"eggs", "milk", "flour"}): "pancakes",
4	frozenset({"eggs", "cheese"}): "omelette",
5	frozenset({"bread", "butter", "cheese"}): "grilled cheese",
6	frozenset({"eggs", "milk", "flour", "sugar"}): "cake"
7	}
8
9	# Look up a recipe by its ingredients
10	my_ingredients = frozenset({"flour", "eggs", "milk"})
11	print("Recipe:", recipes[my_ingredients])
12
13	# Check if we have a recipe for these ingredients
14	test_ingredients = frozenset({"eggs", "bacon"})
15	if test_ingredients in recipes:
16	print("Found recipe:", recipes[test_ingredients])
17	else:
18	print("No recipe found for", test_ingredients)

>>>Output

Recipe: pancakes

No recipe found for frozenset({'eggs', 'bacon'})

This pattern is powerful for many applications: mapping feature combinations to outcomes in machine learning, caching results of functions that take sets as arguments, or implementing graph algorithms where nodes are identified by sets of attributes.

Frozensets as Set Elements

You cannot put a regular set inside another set because regular sets are not hashable. But you can put frozensets inside sets, enabling you to represent "sets of sets". This is useful for grouping, clustering, partitioning, or representing complex mathematical relationships.

1	# A set of groups (each group is a frozenset)
2	project_teams = {
3	frozenset({"alice", "bob"}),
4	frozenset({"charlie", "diana", "eve"}),
5	frozenset({"alice", "bob"}),
6	frozenset({"frank"})
7	}
8
9	print("Number of unique teams:", len(project_teams))
10
11	# Check if a specific team exists
12	target_team = frozenset({"charlie", "diana", "eve"})
13	print("Team exists?", target_team in project_teams)
14
15	# Find all teams that include alice
16	alice_teams = [team for team in project_teams if "alice" in team]
17	print("Alice's teams:", alice_teams)

>>>Output

Number of unique teams: 3

Team exists? True

Alice's teams: [frozenset({'alice', 'bob'})]

The duplicate team (alice, bob) is automatically eliminated because sets only keep unique elements. This makes frozensets ideal for representing partitions, equivalence classes, or any situation where you need to track unique groups of items.

•Regular Set

Mutable: can add/remove elements
Not hashable
Cannot be a dictionary key
Cannot be an element of another set
Use when you need to modify

•Frozenset

Immutable: content fixed at creation
Hashable
Can be a dictionary key
Can be a set element
Use for composite keys/identities

Python Quiz

> A frozenset is created from a list and used as a dictionary key. Pick the correct constructor to make the set immutable and hashable, and the correct method to check how many elements remain after deduplication.

key = ___([1, 2, 2, 3])
data = {key: "found"}
print(data[key])
print(___(key))

frozenset

tuple

set

type

len

Converting Set Types

You can freely convert between sets and frozensets. Converting a set to a frozenset "freezes" its current contents. Converting a frozenset to a set creates a mutable copy that you can then modify.

1	# Start with a mutable set and build it up
2	temp_set = set()
3	temp_set.add("item1")
4	temp_set.add("item2")
5	temp_set.add("item3")
6
7	# Freeze it when done building
8	final_key = frozenset(temp_set)
9	print("Frozen key:", final_key)
10
11	# To modify, create a mutable copy
12	mutable_copy = set(final_key)
13	mutable_copy.add("item4")
14	print("Modified copy:", mutable_copy)
15	print("Original still frozen:", final_key)

>>>Output

Frozen key: frozenset({'item1', 'item2', 'item3'})

Modified copy: {'item1', 'item2', 'item3', 'item4'}

Original still frozen: frozenset({'item1', 'item2', 'item3'})

TIP

A common pattern is to build up a set incrementally, then freeze it when complete. This gives you the flexibility of mutable sets during construction and the safety and hashability of frozensets for storage.

Set Comprehensions

Daily Life

Interviews

Build sets with comprehension syntax

Set comprehensions provide a concise, expressive way to create sets from iterables with optional filtering and transformation. If you are familiar with list comprehensions, set comprehensions follow the exact same syntax but use curly braces instead of square brackets. The result is a set with all duplicates automatically removed.

The general syntax is {expression for item in iterable if condition}. The expression is evaluated for each item that passes the optional condition, and all unique results are collected into a set. This is often more readable and efficient than building a set with explicit loops.

1	squares = {x * x for x in range(1, 6)}
2	print("Squares:", squares)
3
4	# Compare to the equivalent loop
5	squares_loop = set()
6	for x in range(1, 6):
7	squares_loop.add(x * x)
8	print("From loop:", squares_loop)
9
10

>>>Output

Squares: {1, 4, 9, 16, 25}

From loop: {1, 4, 9, 16, 25}

The comprehension accomplishes the same result in one readable line. Both produce identical sets of squares from 1 to 25. The comprehension form is generally preferred in Python because it is more concise and clearly expresses intent.

Filtering with Conditions

Add an if clause to filter which elements are included in the resulting set. Only items for which the condition evaluates to True contribute to the result. This is equivalent to wrapping the loop body in an if statement.

1	# Only include even numbers
2	evens = {x for x in range(1, 21) if x % 2 == 0}
3	print("Evens:", evens)
4
5	# Only include positive values
6	values = [-3, -1, 0, 1, 2, 3, -2, 1, 3]
7	positives = {x for x in values if x > 0}
8	print("Positive values:", positives)
9
10	# Extract unique words that are at least 5 characters
11	words = ["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
12	long_words = {word for word in words if len(word) >= 5}
13	print("Long words:", long_words)

>>>Output

Evens: {2, 4, 6, 8, 10, 12, 14, 16, 18, 20}

Positive values: {1, 2, 3}

Long words: {'quick', 'brown', 'jumps'}

Notice in the second example how duplicates (1 and 3 appear twice in the input) are automatically handled. The third example filters the word list and naturally deduplicates, though in this case there were no duplicate long words.

Transforming Elements

The expression before the for keyword can be any transformation of the loop variable. This allows you to extract, convert, or compute derived values in a single expression.

1	names = ["Alice", "BOB", "charlie", "DIANA", "alice"]
2
3	# Normalize to lowercase (automatically handles duplicates)
4	normalized = {name.lower() for name in names}
5	print("Normalized names:", normalized)
6
7	# Get unique string lengths
8	lengths = {len(name) for name in names}
9	print("Unique lengths:", lengths)
10
11	# Extract domain names from email addresses
12	emails = ["alice@gmail.com", "bob@yahoo.com", "charlie@gmail.com"]
13	domains = {email.split("@")[1] for email in emails}
14	print("Unique domains:", domains)

>>>Output

Normalized names: {'alice', 'bob', 'charlie', 'diana'}

Unique lengths: {3, 5, 7}

Unique domains: {'gmail.com', 'yahoo.com'}

In the first example, "Alice" and "alice" both normalize to "alice", so only four unique names remain. The domain extraction example naturally deduplicates: even though gmail.com appears twice in the input, it appears only once in the result.

Transforming and Filtering

You can combine transformation expressions with filter conditions to create powerful one-liners. The condition is evaluated before the transformation, so you can filter based on the original values while transforming the output.

1	# Get uppercase versions of names longer than 4 characters
2	names = ["alice", "bob", "charlie", "diana", "eve"]
3	long_upper = {name.upper() for name in names if len(name) > 4}
4	print("Long names (upper):", long_upper)
5
6	# Extract unique first characters from words starting with vowels
7	words = ["apple", "banana", "orange", "umbrella", "ice", "egg"]
8	vowel_initials = {word[0].upper() for word in words if word[0] in "aeiou"}
9	print("Vowel initials:", vowel_initials)

>>>Output

Long names (upper): {'ALICE', 'CHARLIE', 'DIANA'}

Vowel initials: {'A', 'O', 'U', 'I', 'E'}

Nested Comprehensions

You can use multiple for clauses to iterate over nested structures. This is useful for flattening lists of lists, processing matrices, or generating combinations. The for clauses are processed left to right, like nested loops.

1	# Flatten a matrix and get unique values
2	matrix = [
3	[1, 2, 3],
4	[2, 3, 4],
5	[3, 4, 5]
6	]
7
8	unique_values = {value for row in matrix for value in row}
9	print("Unique values:", unique_values)
10
11	# Generate all unique sums from two lists
12	list1 = [1, 2, 3]
13	list2 = [10, 20]
14	sums = {a + b for a in list1 for b in list2}
15	print("Unique sums:", sums)

>>>Output

Unique values: {1, 2, 3, 4, 5}

Unique sums: {11, 12, 13, 21, 22, 23}

The matrix has nine total elements but only five unique values. The nested comprehension flattens the structure and eliminates duplicates in one expression. The sums example shows how to generate all combinations from two sequences.

Using Set Comprehensions

Set comprehensions are ideal when you need unique values from a transformation. If duplicates might exist in your input or result, and you want each value only once, a set comprehension is cleaner than a list comprehension followed by a set() conversion.

1	# Scenario: Extract unique tags from blog posts
2	posts = [
3	{"title": "Intro to Python", "tags": ["python", "beginner"]},
4	{"title": "Advanced Python", "tags": ["python", "advanced"]},
5	{"title": "Data Science", "tags": ["python", "data", "science"]},
6	]
7
8	# Set comprehension: direct and efficient
9	all_tags = {tag for post in posts for tag in post["tags"]}
10	print("Unique tags:", all_tags)
11
12	# Alternative (less efficient): list comprehension + set
13	all_tags_alt = set([tag for post in posts for tag in post["tags"]])
14	print("Same result:", all_tags_alt)

>>>Output

Unique tags: {'python', 'beginner', 'advanced', 'data', 'science'}

Same result: {'python', 'beginner', 'advanced', 'data', 'science'}

TIP

If you do not need uniqueness or if preserving order matters, use a list comprehension instead. Set comprehensions automatically deduplicate and do not preserve insertion order (though iteration order is stable in Python 3.7+).

Try switching between set and list comprehension syntax to see how the bracket type changes the result.

Fill in the Blank

> A list [1, 2, 2, 3, 3, 3] has duplicates and you want to double each value. Pick opening and closing brackets to create either a set or list comprehension and see how duplicates are handled.

nums = [1, 2, 2, 3, 3, 3]
result = x * 2 for x in nums
print(type(result), result)

The bracket type is the only syntactic difference between a set comprehension and a list comprehension. Curly braces {} produce a set with automatic deduplication; square brackets [] produce a list that preserves every element including duplicates.

Set comprehensions are especially useful when collecting unique tags, categories, or identifiers from a nested data structure, since any duplicate values are silently discarded during construction.

TIP

If you need both uniqueness and ordering, build a list comprehension first, then convert: list(dict.fromkeys(items)). This preserves insertion order while removing duplicates, unlike a set which does not guarantee order.

Performance: When to Use Sets

Daily Life

Interviews

Choose sets vs lists for performance

Choosing between sets and lists affects both correctness and performance. Understanding the time complexity of common operations helps you make informed decisions. The wrong choice can turn an efficient algorithm into a painfully slow one.

x in coll.add(x).remove(x)for x incoll[i]

x in coll

Membership

Set O(1) vs List O(n)

.add(x)

Add element

Set O(1) vs List O(1)

.remove(x)

Remove element

Set O(1) vs List O(n)

for x in

Iteration

Both are O(n) linear

coll[i]

Index access

Set N/A vs List O(1)

The critical difference is membership testing. For a list with 1 million elements, checking if an item exists requires scanning up to 1 million elements in the worst case. For a set, the same check is nearly instant regardless of size because sets use hash tables internally.

Membership Testing Compared

The in operator behaves very differently for sets versus lists. This difference becomes dramatic as collection size grows.

1	# Demonstrate the difference conceptually
2	# (actual timing would require larger data)
3
4	small_list = [1, 2, 3, 4, 5]
5	small_set = {1, 2, 3, 4, 5}
6
7	# Both are fast for small collections
8	print("5 in list:", 5 in small_list)
9	print("5 in set:", 5 in small_set)
10
11	# For large collections, the difference is huge:
12	# List with 1 million elements: up to 1 million comparisons
13	# Set with 1 million elements: ~1-3 hash comparisons
14
15	# This matters when checking many items:
16	items_to_check = [1, 999999, 500000, 3, 7]
17	# Against a list: 5 * 500000 avg = 2.5 million ops
18	# Against a set: 5 * 1 = 5 ops

>>>Output

5 in list: True

5 in set: True

When to Prefer Lists

Sets are not always the right choice. Lists have advantages that make them better for certain use cases. Understanding when to use each data structure is essential for writing good Python code.

Order matters

Lists preserve insertion order; sets do not guarantee it

Duplicates are meaningful

Lists keep all occurrences; sets eliminate duplicates

Index access needed

list[0], list[-1], list[5:10] all require ordered access

Unhashable elements

Lists can hold other lists; sets cannot hold mutable types

Append/pop from ends

Lists are optimized for stack and queue style operations

When to Prefer Sets

Sets excel in scenarios where their unique properties provide clear benefits over lists.

Uniqueness required

Duplicates should be automatically eliminated on insertion

Frequent membership tests

Checking "is X in the collection?" happens often in your code

Set operations needed

Union, intersection, difference operations are central to the task

Order does not matter

No need for indexing or position-based access to elements

Large collection lookups

Performance gains from O(1) lookup are significant at scale

Hybrid Approach: List + Set

A common optimization is to maintain both a list (for ordering and indexed access) and a set (for fast lookup). This gives you the best of both worlds at the cost of extra memory.

1	class OrderedUniqueCollection:
2	def __init__(self):
3	self._list = []
4	self._set = set()
5
6	def add(self, item):
7	if item not in self._set:
8	self._list.append(item)
9	self._set.add(item)
10	return True
11	return False
12
13	def contains(self, item):
14	return item in self._set
15
16	def items(self):
17	return self._list
18
19	# Usage
20	collection = OrderedUniqueCollection()
21	for item in ["apple", "banana", "apple", "cherry", "banana"]:
22	added = collection.add(item)
23	print(f"Adding {item}: {'added' if added else 'duplicate'}")
24
25	print("Final ordered items:", collection.items())
26	print("Contains 'banana'?", collection.contains("banana"))

>>>Output

Adding apple: added

Adding banana: added

Adding apple: duplicate

Adding cherry: added

Adding banana: duplicate

Final ordered items: ['apple', 'banana', 'cherry']

Contains 'banana'? True

TIP

Python 3.7+ dicts preserve insertion order and have O(1) lookup. For ordered unique collections, dict.fromkeys(items) is often simpler than maintaining parallel list/set structures.

Python Quiz

> You have a list of allowed IDs and need to validate incoming requests efficiently. Pick the correct conversion to enable O(1) lookups, and the right operator to check membership.

allowed = ___(["A01", "A02", "A03"])
request = "A02"
print(request ___ allowed)
print(len(allowed))

dict

set

list

Advanced Pitfalls

Even experienced developers encounter subtle issues with sets. Understanding these pitfalls helps you write robust code and debug problems quickly.

Pitfall 1: Loop Mutation

Never modify a set while iterating over it. Adding or removing elements during iteration causes undefined behavior, typically raising a RuntimeError. This applies to all mutable collections in Python.

1	# WRONG: This raises RuntimeError
2	numbers = {1, 2, 3, 4, 5}
3	for n in numbers:
4	if n % 2 == 0:
5	# RuntimeError: Set changed size during iteration
6	numbers.remove(n)
7
8	# CORRECT: Iterate over a copy
9	numbers = {1, 2, 3, 4, 5}
10	# .copy() creates a snapshot to iterate safely
11	for n in numbers.copy():
12	if n % 2 == 0:
13	numbers.remove(n)
14	print(numbers)
15
16	# ALTERNATIVE: Use set comprehension to create new set
17	numbers = {1, 2, 3, 4, 5}
18	odds = {n for n in numbers if n % 2 != 0}
19	print(odds)

The comprehension approach is often cleaner because it creates a new set with the desired elements rather than modifying in place. Use .copy() when you specifically need to modify the original set.

Pitfall 2: Sets vs Lists

A set and a list with identical elements are never equal in Python. They are different types, and equality comparison returns False. This catches many developers by surprise.

1	my_list = [1, 2, 3]
2	my_set = {1, 2, 3}
3
4	# Direct comparison: always False (different types)
5	print("list == set:", my_list == my_set)
6
7	# To compare contents, convert to same type
8	print("set(list) == set:", set(my_list) == my_set)
9	print("list(set) == list:", sorted(list(my_set)) == sorted(my_list))
10
11	# This also applies to frozensets
12	my_frozen = frozenset([1, 2, 3])
13	print("set == frozenset:", my_set == my_frozen)

>>>Output

list == set: False

set(list) == set: True

list(set) == list: True

set == frozenset: True

Note that sets and frozensets CAN be equal because they are both set types. Only comparisons between different collection families (list/set, tuple/set, etc.) return False.

Pitfall 3: Set Order

Although CPython 3.7+ maintains insertion order internally for dict and has stable iteration order for sets, sets are formally unordered collections. You should never write code that depends on set iteration order, as it is an implementation detail that could change.

1	# Order appears consistent but is NOT guaranteed
2	colors = {"red", "green", "blue"}
3
4	# This works but is bad practice
5	first = next(iter(colors))
6	print(f"Got: {first}")
7
8	# If you need consistent ordering, use a list
9	colors_ordered = ["red", "green", "blue"]
10	print(f"First: {colors_ordered[0]}")
11
12	# Or use sorted() for alphabetical order
13	print(f"Sorted first: {sorted(colors)[0]}")

>>>Output

Got: red

First: red

Sorted first: blue

Pitfall 4: set() vs {}

The curly brace syntax {} creates an empty dictionary, not an empty set. To create an empty set, you must use set(). This is a common source of confusion and bugs.

1	# This is a DICTIONARY, not a set!
2	empty_braces = {}
3	print("type({}):", type(empty_braces))
4
5	# This is how you create an empty set
6	empty_set = set()
7	print("type(set()):", type(empty_set))
8
9	# Non-empty sets use curly braces
10	non_empty_set = {1, 2, 3}
11	print("type({1,2,3}):", type(non_empty_set))
12
13

>>>Output

type({}): <class 'dict'>

type(set()): <class 'set'>

type({1,2,3}): <class 'set'>

The code below tries to build a frozenset for use as a dictionary key, but has a subtle bug. Can you fix it?

Debug Challenge

> This code wraps a list in set() and then frozenset(), but the intermediate set() call is redundant since frozenset() already accepts a list.

Redundant double-wrapping: set() inside frozenset() is unnecessary

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99



key = frozenset(set(["a", "b"]))
data = {key: 42}
print(data[key])
key = frozenset(set(["a", "b"]))
data = {key: 42}
print(data[key])

Pitfall 5: Custom Objects

When creating custom classes for use in sets, you must properly implement both __hash__ and __eq__. If two objects are equal (according to __eq__), they must have the same hash value. Violating this contract causes unpredictable behavior.

1	class User:
2	def __init__(self, user_id, name):
3	self.user_id = user_id
4	self.name = name
5
6	def __eq__(self, other):
7	# Users are equal if they have the same ID
8	return isinstance(other, User) and self.user_id == other.user_id
9
10	def __hash__(self):
11	# Hash must be consistent with __eq__
12	# Only include fields used in __eq__
13	return hash(self.user_id)
14
15	def __repr__(self):
16	return f"User({self.user_id}, '{self.name}')"
17
18	# Same user ID = same user, even with different names
19	users = {
20	User(1, "Alice"),
21	User(2, "Bob"),
22	User(1, "Alicia")
23	}
24
25	print("Number of users:", len(users))
26	print("Users:", users)

The set contains only 2 users because User(1, "Alice") and User(1, "Alicia") are considered equal (same user_id). The one added later replaces the earlier one in the set.

Advanced set concepts connect to deeper ideas about hashing, equality, and data integrity. Put your skills to the test with hands-on challenges in the Python Builder.

❯❯❯PUTTING IT ALL TOGETHER

> You are a senior data engineer at The Trade Desk building a high-throughput audience membership system for real-time bidding: you use frozensets as dictionary keys to represent immutable audience segment combos, set comprehensions to filter qualifying user IDs, and O(1) set lookups to check segment membership in under a millisecond.

frozenset makes each audience segment combination hashable so it can serve as a dictionary key mapping to its bid price.

{x for x in ...} set comprehension filters the raw user ID stream down to qualifying bidder IDs in a single concise expression.

O(1) set membership testing checks whether a user belongs to a target segment in constant time regardless of total audience size.

Choosing sets over lists for the membership index avoids the O(n) linear scan that would exceed the sub-millisecond bid deadline.

KEY TAKEAWAYS

frozenset is immutable and hashable; can be dict keys or set elements

Set operations on frozensets return the type of the first operand

Set comprehensions: {expression for item in iterable if condition}

Nested comprehensions flatten structures: {x for row in matrix for x in row}

Deduplication: set(items) or list(dict.fromkeys(items)) (preserves order)

Sets have O(1) membership test; lists have O(n)

Use sets for validation, cohort analysis, tag matching, and filtering

Never modify a set while iterating; use .copy() or comprehensions

Sets and lists are never equal ([1,2] != {1,2})

{} creates an empty dict, not a set; use set()

Custom classes need consistent __hash__ and __eq__

Sets: Advanced

Frozensets: Immutable Sets

Frozensets Are Immutable

Set Ops Return Same Type

Frozensets as Dict Keys

Frozensets as Set Elements

Converting Set Types

Set Comprehensions

Filtering with Conditions

Transforming Elements

Transforming and Filtering

Nested Comprehensions

Using Set Comprehensions

Performance: When to Use Sets

Membership Testing Compared

When to Prefer Lists

When to Prefer Sets

Hybrid Approach: List + Set

Advanced Pitfalls

Pitfall 1: Loop Mutation

Pitfall 2: Sets vs Lists

Pitfall 3: Set Order

Pitfall 4: set() vs {}

Pitfall 5: Custom Objects

Lesson Sections