Sets: Intermediate

Dropbox syncs billions of files across hundreds of millions of devices, and at the core of deciding what needs to be uploaded is a set operation: comparing the set of file hashes already stored in the cloud against the set of file hashes on your local machine. The difference of those two sets tells Dropbox exactly which files are new and need uploading, without scanning file contents or re-transferring anything that is already synced. Set comprehensions, frozensets, and the symmetric difference operator you will learn in this lesson are the same algebraic tools that content deduplication systems use at petabyte scale to move only the data that actually changed.

Union: Combining Sets

Daily Life

Interviews

Merge sets with union and pipe

A union combines all elements from two or more sets into a single set. If an element appears in any of the input sets, it appears in the union exactly once. The union operation automatically handles duplicates because the result is still a set, which by definition contains only unique elements. This makes union perfect for merging data from multiple sources.

The mathematical notation for union is A ∪ B, read as "A union B". The union of sets A and B contains every element that is in A, in B, or in both. The key insight is that union is an inclusive operation: if you are in either set, you are in the result. This is analogous to the logical OR operation: an element is in the union if it is in A OR in B OR in both.

Inclusive

Contains every element that is in A, in B, or in both sets

Notation: A ∪ B

Written as A union B in mathematical set theory notation

Lower bound

Result is always at least as large as the largest input set

Upper bound

At most the sum of both set sizes when there is zero overlap

	frontend_skills = {"html", "css", "javascript", "react"}
	backend_skills = {"python", "sql", "javascript", "django"}

	# Union: all skills from both sets
	all_skills = frontend_skills.union(backend_skills)
	print("All skills:", all_skills)
	print("Count:", len(all_skills))

>>>Output

All skills: {'html', 'css', 'javascript', 'react', 'python', 'sql', 'django'}

Count: 7

Notice that "javascript" appears in both the frontend and backend sets but only once in the result. The union contains seven elements, not eight, because duplicates are automatically eliminated. This is exactly what you want when combining data sources: a complete list without repetition.

The Pipe Operator |

Python provides the | operator as a shorthand for union. This pipe symbol is often preferred for its concise syntax and resemblance to mathematical notation. In many programming contexts, the pipe symbol represents "or", which aligns with the inclusive nature of union: an element is in the result if it is in set A | (or) in set B.

	frontend_skills = {"html", "css", "javascript"}
	backend_skills = {"python", "sql", "javascript"}

	# These produce identical results:
	method_result = frontend_skills.union(backend_skills)
	operator_result = frontend_skills \| backend_skills

	print("Method result:", method_result)
	print("Operator result:", operator_result)
	print("Equal?", method_result == operator_result)

>>>Output

Method result: {'html', 'css', 'javascript', 'python', 'sql'}

Operator result: {'html', 'css', 'javascript', 'python', 'sql'}

Equal? True

Both approaches produce identical results. The operator syntax is more concise and often preferred when both operands are already sets. However, there is an important difference between the method and operator forms that affects how you use them with other data types.

•.union() Method

Method syntax with parentheses
Works with any iterable (list, tuple)
a.union([1, 2, 3]) works directly
More flexible for mixed types

•| Operator

Operator syntax, more concise
Requires sets on both sides
a | [1, 2, 3] raises TypeError
Must convert to set first

The method form is more flexible because it accepts any iterable as an argument. If you have a list, tuple, or generator, you can pass it directly to the .union() method without first converting it to a set. The operator form requires both operands to be sets, so you must explicitly convert other types before using the | operator.

	current_users = {"alice", "bob", "charlie"}
	new_signups = ["diana", "eve", "bob"]

	# Method works with the list directly
	all_users = current_users.union(new_signups)
	print("With method:", all_users)

	# Operator requires converting to set first
	all_users2 = current_users \| set(new_signups)
	print("With operator:", all_users2)

>>>Output

With method: {'alice', 'bob', 'charlie', 'diana', 'eve'}

With operator: {'alice', 'bob', 'charlie', 'diana', 'eve'}

Chaining Multiple Unions

You can union more than two sets at once by chaining the operator or passing multiple arguments to the method. Both approaches combine all unique elements from all input sets into a single result. This is essential when you need to merge data from three or more sources.

	team_a = {"python", "sql"}
	team_b = {"javascript", "python"}
	team_c = {"rust", "sql", "go"}
	team_d = {"java", "python"}

	# Chain operators to union four sets
	all_skills = team_a \| team_b \| team_c \| team_d
	print("All skills:", all_skills)

	# Or use method with multiple arguments
	all_skills2 = team_a.union(team_b, team_c, team_d)
	print("Same result:", all_skills2)

>>>Output

All skills: {'python', 'sql', 'javascript', 'rust', 'go', 'java'}

Same result: {'python', 'sql', 'javascript', 'rust', 'go', 'java'}

Python appears in three of the four sets but only once in the result. SQL appears in two sets but only once in the result. The union correctly consolidates all unique skills across all teams, making it trivial to answer "what skills does our organization have?"

Merging Permissions

Union is commonly used in permission systems where a user belongs to multiple groups. Each group grants certain permissions, and the user should have the combined permissions from all their groups. This is a classic use case that appears in operating systems, web applications, and databases.

	# Permission sets for different roles
	viewer_perms = {"read", "list", "search"}
	editor_perms = {"read", "write", "edit", "list"}
	admin_perms = {"read", "write", "delete", "admin", "list"}

	# A user who is both an editor and has some admin rights
	user_groups = [editor_perms, {"delete"}]

	# Calculate effective permissions
	effective_perms = set()
	for group in user_groups:
	effective_perms = effective_perms \| group

	print("User can:", effective_perms)

>>>Output

User can: {'read', 'write', 'edit', 'list', 'delete'}

The user gets all permissions from their editor role plus the delete permission. Union ensures no duplicate permissions and provides a clear, complete set of what the user can do. This pattern scales to any number of groups or roles without changing the logic.

Union with Empty Sets

The empty set is the identity element for union: unioning any set with an empty set returns the original set unchanged. This may seem obvious, but it is an important property that makes your code robust when handling edge cases where one of your data sources might be empty.

	users = {"alice", "bob", "charlie"}
	empty = set()

	print("Users \| empty:", users \| empty)
	print("Empty \| users:", empty \| users)
	print("Empty \| empty:", empty \| empty)

>>>Output

Users | empty: {'alice', 'bob', 'charlie'}

Empty | users: {'alice', 'bob', 'charlie'}

Empty | empty: set()

This identity property means you can safely union sets without checking if they are empty first. Your code works correctly regardless of whether any input set happens to be empty.

TIP

Try choosing the right method below to combine two sets of user roles into a single set of all permissions.

Fill in the Blank

> Two teams have overlapping members: admins are {"alice", "bob"} and editors are {"bob", "charlie"}. Pick a set operation to produce the combined staff list.

admins = {{"alice", "bob"}}
editors = {{"bob", "charlie"}}
all_staff = admins.(editors)
print(all_staff)

Union is the most inclusive set operation: every element from every input set appears exactly once in the result. It is the right operation when you need a complete combined view without worrying about overlap.

The three set operations, union, intersection, and difference, each answer a different question about two collections. Union asks "what is in either?", intersection asks "what is in both?", and difference asks "what is in one but not the other?"

TIP

Use .union() or the | operator interchangeably. The method form accepts non-set iterables directly: a.union(my_list) works without converting the list first, which can make code cleaner when combining collections of mixed types.

Intersection: Finding Common Elements

Daily Life

Interviews

Find shared elements with intersection

An intersection finds elements that exist in all specified sets. If an element is in set A AND in set B, it appears in the intersection. Elements that are in only one set are excluded. The intersection operation answers the question "what do these sets have in common?" This is fundamental for finding overlaps, shared characteristics, or common attributes.

The mathematical notation for intersection is A ∩ B, read as "A intersect B". The intersection of sets A and B contains only elements that are in both A and B simultaneously. This is analogous to the logical AND operation: an element is in the intersection only if it is in A AND in B. The intersection is always smaller than or equal to the smallest input set.

Both sets required

Contains only elements present in A and B simultaneously

Notation: A ∩ B

Read as "A intersect B" in mathematical set notation

Can be empty

Returns empty set when A and B share no common elements

Size limited

Result is at most as large as the smaller input set

	frontend_skills = {"html", "css", "javascript", "react"}
	backend_skills = {"python", "sql", "javascript", "node"}

	# Intersection: skills in BOTH sets
	common_skills = frontend_skills.intersection(backend_skills)
	print("Common skills:", common_skills)

	# What percentage overlap?
	overlap_pct = len(common_skills) / len(frontend_skills \| backend_skills) * 100
	print(f"Overlap: {overlap_pct:.1f}%")

>>>Output

Common skills: {'javascript'}

Overlap: 14.3%

Only "javascript" appears in both the frontend and backend skill sets, so the intersection contains just that one element. The overlap percentage shows how much the two sets have in common relative to their combined unique elements.

The Ampersand Operator &

Python provides the & operator as a shorthand for intersection. The ampersand symbol is borrowed from the logical AND operation, which is fitting because intersection returns elements that are in A AND in B. Just like the and keyword requires both conditions to be true, the intersection requires an element to be in both sets.

	a = {1, 2, 3, 4, 5, 6}
	b = {4, 5, 6, 7, 8, 9}

	# These produce identical results:
	method_result = a.intersection(b)
	operator_result = a & b

	print("Method:", method_result)
	print("Operator:", operator_result)
	print("Equal?", method_result == operator_result)

>>>Output

Method: {4, 5, 6}

Operator: {4, 5, 6}

Equal? True

As with union, the .intersection() method form accepts any iterable while the & operator requires both sides to be sets. Choose the method when working with lists or other iterables, and the operator for concise code when both operands are already sets.

Overlap Across Sets

When intersecting multiple sets, only elements present in ALL sets are included in the result. This becomes increasingly restrictive as you add more sets: an element must pass through every filter to appear in the final intersection.

	# Skills across three different teams
	team_a = {"python", "sql", "aws", "docker", "linux"}
	team_b = {"python", "java", "aws", "kubernetes", "linux"}
	team_c = {"python", "aws", "terraform", "docker", "linux"}

	# Skills that ALL three teams have
	shared_skills = team_a & team_b & team_c
	print("All teams know:", shared_skills)

	# Skills that at least two teams share
	print("A & B:", team_a & team_b)
	print("A & C:", team_a & team_c)
	print("B & C:", team_b & team_c)

>>>Output

All teams know: {'python', 'aws', 'linux'}

A & B: {'python', 'aws', 'linux'}

A & C: {'python', 'aws', 'docker', 'linux'}

B & C: {'python', 'aws', 'linux'}

Only "python", "aws", and "linux" appear in all three sets. "docker" appears in teams A and C but not B, so it is excluded from the three-way intersection. Notice how the intersection shrinks or stays the same as you add more sets to intersect.

Finding Common Customers

Intersection is invaluable for identifying overlap between customer segments. This helps with targeting campaigns to engaged users, finding cross-sell opportunities, or analyzing customer behavior across different touchpoints.

	newsletter_subscribers = {"alice", "bob", "charlie", "diana", "eve"}
	recent_purchasers = {"bob", "diana", "frank", "grace"}
	mobile_app_users = {"bob", "diana", "eve", "henry"}

	# Customers who are engaged across all three channels
	highly_engaged = newsletter_subscribers & recent_purchasers & mobile_app_users
	print("Highly engaged:", highly_engaged)

	# Customers who subscribe AND purchased (two conditions)
	converted_subscribers = newsletter_subscribers & recent_purchasers
	print("Converted subscribers:", converted_subscribers)

>>>Output

Highly engaged: {'bob', 'diana'}

Converted subscribers: {'bob', 'diana'}

Bob and Diana are the most engaged customers, appearing in all three segments. These highly engaged customers might receive special offers or be candidates for a loyalty program. The intersection makes this analysis trivial.

Intersection: Empty Sets

The empty set is the annihilator for intersection: intersecting any set with an empty set always returns an empty set. This makes sense logically: if one set has no elements, there can be no elements that are in both sets.

	users = {"alice", "bob", "charlie"}
	empty = set()

	print("Users & empty:", users & empty)
	print("Empty & users:", empty & users)

	# Empty set in chain = empty result
	a = {1, 2, 3}
	b = set()
	c = {2, 3, 4}
	print("a & b & c:", a & b & c)

>>>Output

Users & empty: set()

Empty & users: set()

a & b & c: set()

This property means that if any set in a multi-set intersection is empty, the entire result is empty. Be aware of this when debugging: if you expect results but get an empty set, check if any of your input sets might be empty.

Difference: Elements Unique to One Set

Daily Life

Interviews

Isolate unique elements per set

The difference of two sets returns elements that are in the first set but not in the second. This operation answers the question "what is in A that is not in B?" Unlike union and intersection, difference is not symmetric: A - B gives different results than B - A. The order matters because you are asking a directional question.

Think of difference as starting with all elements of the first set, then removing any element that also appears in the second set. What remains are elements unique to the first set. This is extremely useful for finding what is new, what is missing, what was added, or what was removed.

A - BA ⊆ BDisjoint

A - B

Subtract B

Elements in A but not B

A ⊆ B

Empty result

Empty if A is subset of B

Disjoint

Full result

Returns all of A if no overlap

	all_employees = {"alice", "bob", "charlie", "diana", "eve", "frank"}
	on_vacation = {"bob", "diana"}
	remote_today = {"charlie", "eve"}

	# Who is in the office today?
	in_office = all_employees - on_vacation - remote_today
	print("In office:", in_office)

	# Alternative: use .difference() method
	in_office2 = all_employees.difference(on_vacation, remote_today)
	print("Same result:", in_office2)

>>>Output

In office: {'alice', 'frank'}

Same result: {'alice', 'frank'}

Starting with all employees, we first remove those on vacation, then remove those working remotely. The result shows who is physically in the office. This kind of filtering is natural with set difference and would be more complex with lists.

Order: A - B vs B - A

Unlike union and intersection which are commutative (A op B equals B op A), difference is NOT commutative. The order of operands changes the result completely. This asymmetry is intentional: you are asking "what is in the first set that is not in the second" which is inherently directional.

	a = {1, 2, 3, 4, 5}
	b = {4, 5, 6, 7, 8}

	# These give completely different results
	a_minus_b = a - b
	b_minus_a = b - a

	print("a - b:", a_minus_b)
	print("b - a:", b_minus_a)
	print("Equal?", a_minus_b == b_minus_a)

>>>Output

a - b: {1, 2, 3}

b - a: {6, 7, 8}

Equal? False

a - b gives elements in a but not in b: 1, 2, and 3. These are what make set a unique. b - a gives elements in b but not in a: 6, 7, and 8. These are what make set b unique. The shared elements (4 and 5) appear in neither result.

•A - B

Elements unique to A
What A has that B lacks
What to add to B to include A
Order: first minus second

•B - A

Elements unique to B
What B has that A lacks
What to add to A to include B
Order: first minus second

Chaining Set Differences

You can chain multiple difference operations to remove elements from several sets. Each difference operation removes another layer of elements. This is useful when you have multiple exclusion criteria.

	all_tasks = {"design", "code", "test", "deploy", "document", "review"}
	completed = {"design", "code"}
	blocked = {"deploy"}
	assigned_to_others = {"review"}

	# Tasks I can work on right now
	my_actionable = all_tasks - completed - blocked - assigned_to_others
	print("I can work on:", my_actionable)

	# Same with method (accepts multiple arguments)
	my_actionable2 = all_tasks.difference(completed, blocked, assigned_to_others)
	print("Same result:", my_actionable2)

>>>Output

I can work on: {'test', 'document'}

Same result: {'test', 'document'}

Starting with all tasks, we progressively filter out completed tasks, blocked tasks, and tasks assigned to others. What remains are tasks that are neither done, blocked, nor owned by someone else: the tasks you can actually work on.

New vs Churned Users

Difference is perfect for comparing snapshots over time. By comparing user sets from different time periods, you can identify new acquisitions, retained users, and churned users. This is fundamental to cohort analysis and understanding user lifecycle.

	users_last_month = {"alice", "bob", "charlie", "diana"}
	users_this_month = {"bob", "charlie", "eve", "frank", "grace"}

	# New users: this month but not last month
	new_users = users_this_month - users_last_month
	print("New users:", new_users)

	# Churned users: last month but not this month
	churned_users = users_last_month - users_this_month
	print("Churned users:", churned_users)

	# Retained users: both months (this is intersection)
	retained = users_this_month & users_last_month
	print("Retained users:", retained)

	# Metrics
	print(f"Retention rate: {len(retained)/len(users_last_month)*100:.0f}%")
	print(f"Churn rate: {len(churned_users)/len(users_last_month)*100:.0f}%")

>>>Output

New users: {'eve', 'frank', 'grace'}

Churned users: {'alice', 'diana'}

Retained users: {'bob', 'charlie'}

Retention rate: 50%

Churn rate: 50%

This pattern reveals the complete user lifecycle: Eve, Frank, and Grace are new acquisitions. Alice and Diana churned. Bob and Charlie were retained. With just three set operations, you have comprehensive user lifecycle metrics.

Symmetric Difference

The symmetric difference contains elements that are in either set but NOT in both. Think of it as the opposite of intersection: instead of finding what sets share, you find what makes each set unique. If an element appears in both sets, it is excluded from the symmetric difference.

Mathematically, symmetric difference is equivalent to two other expressions: it equals (A - B) union (B - A), which is the elements unique to A combined with elements unique to B. It also equals (A union B) - (A intersection B), which is everything in either set minus what they share. All three formulations give the same result.

XOR logic

A ^ B contains elements in A or B, but not in both sets

Equivalent form 1

Same as (A - B) union (B - A): unique elements from each side

Equivalent form 2

Same as (A | B) - (A & B): everything minus the overlap

Commutative

Unlike difference, A ^ B always equals B ^ A

	a = {1, 2, 3, 4, 5}
	b = {4, 5, 6, 7, 8}

	sym_diff = a.symmetric_difference(b)
	print("Symmetric difference:", sym_diff)

	# Using the ^ operator
	sym_diff2 = a ^ b
	print("Using ^ operator:", sym_diff2)

	alt1 = (a - b) \| (b - a)
	alt2 = (a \| b) - (a & b)
	print("(a-b)\|(b-a):", alt1)
	print("(a\|b)-(a&b):", alt2)

>>>Output

Symmetric difference: {1, 2, 3, 6, 7, 8}

Using ^ operator: {1, 2, 3, 6, 7, 8}

(a-b)|(b-a): {1, 2, 3, 6, 7, 8}

(a|b)-(a&b): {1, 2, 3, 6, 7, 8}

4 and 5 are excluded because they appear in both sets (they are the intersection). 1, 2, and 3 are unique to set a. 6, 7, and 8 are unique to set b. The symmetric difference contains all six of these unique elements. All four formulations produce the same result.

The Caret Operator ^

Python uses ^ (caret) for symmetric difference. This operator is borrowed from the bitwise XOR (exclusive or) operation. In boolean logic, XOR returns true when exactly one of two inputs is true, but not when both are true. This perfectly matches symmetric difference: an element is included when it is in exactly one set, but not when it is in both.

•Intersection (AND)

Elements in BOTH sets
Operator: &
Like logical AND
Finds shared elements

•Symmetric Diff (XOR)

Elements in EITHER but not BOTH
Operator: ^
Like logical XOR
Finds unique elements

The relationship between intersection and symmetric difference is complementary. Together they partition the union: every element in (A | B) is in either (A & B) or (A ^ B), but never both. If you know intersection and union, you can compute symmetric difference, and vice versa.

Detecting Changes Example

Symmetric difference excels at detecting what changed between two states. Since it excludes elements that stayed the same (elements in both sets), it highlights only the additions and removals. This is invaluable for configuration management, version comparison, and change detection.

	# Configuration at two different times
	config_v1 = {"debug_mode", "cache_enabled", "log_level_info", "feature_a"}
	config_v2 = {"cache_enabled", "log_level_debug", "feature_a", "feature_b"}

	# What settings changed?
	changes = config_v1 ^ config_v2
	print("Changed settings:", changes)

	# Breaking down the changes
	removed = config_v1 - config_v2
	added = config_v2 - config_v1
	print("Removed:", removed)
	print("Added:", added)

>>>Output

Changed settings: {'debug_mode', 'log_level_info', 'log_level_debug', 'feature_b'}

Removed: {'debug_mode', 'log_level_info'}

Added: {'log_level_debug', 'feature_b'}

The symmetric difference immediately shows what changed. Settings that remained the same (cache_enabled, feature_a) are excluded. If you need to know specifically what was added versus removed, use regular difference in both directions.

Symmetric Diff: Commutative

Unlike regular difference, symmetric difference is commutative: A ^ B always equals B ^ A. This makes sense because we are finding elements unique to either side, which is the same regardless of which set we consider "first".

	a = {1, 2, 3}
	b = {3, 4, 5}

	print("a ^ b:", a ^ b)
	print("b ^ a:", b ^ a)
	print("Equal?", (a ^ b) == (b ^ a))

>>>Output

a ^ b: {1, 2, 4, 5}

b ^ a: {1, 2, 4, 5}

Equal? True

Try each set operator below to see how the same two sets produce completely different results depending on the operation.

Fill in the Blank

> Two sets a = {1, 2, 3, 4} and b = {3, 4, 5, 6} overlap on some elements. Pick a set operator to see how union, intersection, difference, and symmetric difference each produce a different result.

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
print(a  b)

Operation Summary

Understanding when to use each operation is essential for effective data processing. Here is a comprehensive reference guide that summarizes all four operations with their methods, operators, and typical use cases.

Union | .union()

Combine everything from all sets into one collection

Intersection &

Find elements that all sets share in common

Difference -

Find elements in the first set but not in the second

Symmetric Diff ^

Find elements that are unique to each set, not shared

	a = {1, 2, 3, 4}
	b = {3, 4, 5, 6}

	print("Union (a \| b):", a \| b)
	print("Intersection (a & b):", a & b)
	print("Difference (a - b):", a - b)
	print("Difference (b - a):", b - a)
	print("Symmetric Diff (a ^ b):", a ^ b)

>>>Output

Union (a | b): {1, 2, 3, 4, 5, 6}

Intersection (a & b): {3, 4}

Difference (a - b): {1, 2}

Difference (b - a): {5, 6}

Symmetric Diff (a ^ b): {1, 2, 5, 6}

Methods vs Operators

Each operation has both a method form and an operator form. The key difference is that methods accept any iterable (lists, tuples, generators), while operators require set operands on both sides. Choose based on your data types and readability preferences.

	current_set = {1, 2, 3}
	new_items = [3, 4, 5]

	# Method works directly with lists
	result = current_set.union(new_items)
	print("Method with list:", result)

	# Operator requires conversion to set
	result2 = current_set \| set(new_items)
	print("Operator with set:", result2)

	a = {1, 2}
	result3 = a.union([3, 4], (5, 6), {7, 8})
	print("Multiple iterables:", result3)

>>>Output

Method with list: {1, 2, 3, 4, 5}

Operator with set: {1, 2, 3, 4, 5}

Multiple iterables: {1, 2, 3, 4, 5, 6, 7, 8}

Use operators for concise, readable code when both operands are already sets. Use methods when working with lists, tuples, or other iterables, or when you need to combine more than two collections in a single call.

Choosing Method vs Operator

Use the operator (|, &, -, ^) when both sides are already sets
Use the method (.union(), .intersection()) when one side is a list or tuple
Methods accept multiple arguments: a.union(b, c, d) works in one call
Operators chain naturally: a | b | c reads like mathematical notation
Methods are more explicit; operators are more concise

The code below tries to find new users but has the operands reversed. Fix the direction of the difference operation.

Debug Challenge

> This code computes last_month - this_month, which finds churned users instead of new users. The set difference operands are reversed.

Logic error: shows churned users {'alice'} instead of new users {'charlie', 'diana'}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99




last_month = {"alice", "bob"}
this_month = {"bob", "charlie", "diana"}
new_users = last_month - this_month
print(new_users)
last_month = {"alice", "bob"}
this_month = {"bob", "charlie", "diana"}
new_users = last_month - this_month
print(new_users)

Set difference is directional: A - B and B - A produce different results. Always read it as "what is in the first set that is NOT in the second set." Getting the operand order right is the most common source of set difference bugs.

The |, &, -, and ^ operators map directly to union, intersection, difference, and symmetric difference. Using these single-character operators makes set algebra in code read closely to the mathematical notation you would write on paper.

TIP

When you use the - operator to find items in one set but not another, label your variables clearly so the direction is obvious. Names like new_users = this_month - last_month make the intent readable without needing a comment.

Subset and Superset

Daily Life

Interviews

Validate containment and modify in place

Beyond combining sets, you often need to check if one set is contained within another. These containment relationships are called subset and superset. A subset is a set where every element exists in another larger set. A superset is the opposite: it contains all elements of a smaller set plus possibly more.

Subset and superset checks are fundamental for validation, permission checking, and hierarchical data. For example, checking if a user has required permissions (user permissions should be a superset of required permissions), or validating that input is within allowed values (input should be a subset of allowed values).

A ⊆ BB ⊇ AA ⊂ BB ⊃ A

A ⊆ B

Subset

Every element of A is in B

B ⊇ A

Superset

B contains all of A plus more

A ⊂ B

Proper subset

A is in B but not equal to B

B ⊃ A

Proper superset

B has extra elements beyond A

	basic_permissions = {"read"}
	editor_permissions = {"read", "write"}
	admin_permissions = {"read", "write", "delete", "admin"}

	# Check subset relationships
	print("basic ⊆ editor?", basic_permissions.issubset(editor_permissions))
	print("editor ⊆ admin?", editor_permissions.issubset(admin_permissions))
	print("admin ⊆ basic?", admin_permissions.issubset(basic_permissions))

	# Check superset relationships
	print("admin ⊇ editor?", admin_permissions.issuperset(editor_permissions))
	print("editor ⊇ basic?", editor_permissions.issuperset(basic_permissions))

>>>Output

basic ⊆ editor? True

editor ⊆ admin? True

admin ⊆ basic? False

admin ⊇ editor? True

editor ⊇ basic? True

The permission sets form a hierarchy: basic is a subset of editor, which is a subset of admin. This reflects the real permission structure where higher roles include all lower permissions plus additional ones.

The Comparison Operators

Python provides comparison operators for subset and superset checks: <= for subset (less than or equal to) and >= for superset (greater than or equal to). The intuition is that a "smaller" set is one contained within a "larger" set.

	a = {1, 2}
	b = {1, 2, 3, 4}
	c = {1, 2}

	# Subset checks
	print("a <= b (subset):", a <= b)
	print("a < b (proper subset):", a < b)

	# Note: a and c are equal, so...
	print("a <= c:", a <= c)
	print("a < c:", a < c)

	# Superset checks
	print("b >= a:", b >= a)
	print("b > a:", b > a)

>>>Output

a <= b (subset): True

a < b (proper subset): True

a <= c: True

a < c: False

b >= a: True

b > a: True

A proper subset or superset means the sets are not equal. Set a is a subset of c (since they are equal), but not a proper subset. The strict operators (< and >) exclude the case where sets are equal, while the non-strict operators (<= and >=) include equality.

Validation with Subsets

Subset checking is perfect for validating that user input falls within allowed values, or that a required set of items is present in a larger collection.

	allowed_columns = {"id", "name", "email", "age", "city", "country"}

	# User requests certain columns
	user_request = {"name", "email", "city"}

	# Validate the request
	if user_request <= allowed_columns:
	print("Valid request:", user_request)
	else:
	invalid = user_request - allowed_columns
	print("Invalid columns:", invalid)

	# Another request with invalid columns
	bad_request = {"name", "password", "ssn"}
	if bad_request <= allowed_columns:
	print("Valid request")
	else:
	invalid = bad_request - allowed_columns
	print("Invalid columns:", invalid)

>>>Output

Valid request: {'name', 'email', 'city'}

Invalid columns: {'password', 'ssn'}

Checking for Disjoint Sets

Two sets are disjoint if they have no elements in common. The .isdisjoint() method returns True if the sets share no elements. This is equivalent to checking if the intersection is empty, but .isdisjoint() is more efficient because it can stop early as soon as it finds any common element.

	odds = {1, 3, 5, 7, 9}
	evens = {2, 4, 6, 8, 10}
	primes = {2, 3, 5, 7}
	composites = {4, 6, 8, 9, 10}

	print("odds and evens disjoint?", odds.isdisjoint(evens))
	print("odds and primes disjoint?", odds.isdisjoint(primes))
	print("primes and composites disjoint?", primes.isdisjoint(composites))

	# Equivalent to checking empty intersection
	print("Same as intersection check:", len(odds & evens) == 0)

>>>Output

odds and evens disjoint? True

odds and primes disjoint? False

primes and composites disjoint? True

Same as intersection check: True

Odd and even numbers are disjoint by definition. Odd numbers and primes share 3, 5, and 7. Primes and composites are disjoint because no number can be both prime and composite. The isdisjoint() method efficiently tells you whether any overlap exists.

Here is a quick reference for the relationship-checking methods and their operator equivalents.

Set Relationship Methods

.issubset() or <= -- every element of A is also in B
.issuperset() or >= -- A contains every element of B
< proper subset -- A is inside B and they are not equal
> proper superset -- A contains B and has extra elements
.isdisjoint() -- A and B share zero common elements

In-Place Operations

All set operations covered so far create new sets, leaving the originals unchanged. This is often what you want, but sometimes you need to modify a set in place for efficiency or because you want to accumulate changes. Python provides in-place versions of all four fundamental operations using update methods or augmented assignment operators.

In-place operations are more memory efficient because they do not create a new set object. For very large sets, this can be significant. However, they modify the original data, which means you lose the original state. Choose in-place operations when you no longer need the original data and want to save memory.

.update() or |=

Add all elements from another set, performing union in place

.intersection_update() or &=

Keep only elements common to both sets, discard the rest

.difference_update() or -=

Remove from this set any elements found in the other set

.symmetric_difference_update()

Keep only elements unique to each set, drop shared ones

	inventory = {"apple", "banana", "cherry"}
	print("Original:", inventory)

	# Add new items in place (union)
	inventory \|= {"date", "elderberry"}
	print("After \|=:", inventory)

	# Keep items in stock (intersection)
	in_stock = {"banana", "date", "fig"}
	inventory &= in_stock
	print("After &= in_stock:", inventory)

	# Remove recalled items (difference)
	inventory -= {"banana"}
	print("After -= recalled:", inventory)

>>>Output

Original: {'apple', 'banana', 'cherry'}

After |=: {'apple', 'banana', 'cherry', 'date', 'elderberry'}

After &= in_stock: {'banana', 'date'}

After -= recalled: {'date'}

Each operation modifies the inventory set directly. After all operations, only "date" remains. The original set is progressively transformed rather than replaced. This is efficient but means the original data is lost.

Accumulating Data: Update

The .update() method (or |= operator) is particularly useful for accumulating data from multiple sources into a single set. This is common when processing files, API responses, or database queries where data arrives in batches.

	all_users = set()

	source1_users = ["alice", "bob", "charlie"]
	source2_users = ("bob", "diana", "eve")
	source3_users = {"eve", "frank"}

	# Accumulate all users into one set
	all_users.update(source1_users)
	all_users.update(source2_users)
	all_users.update(source3_users)

	print("All unique users:", all_users)
	print("Total unique:", len(all_users))

>>>Output

All unique users: {'alice', 'bob', 'charlie', 'diana', 'eve', 'frank'}

Total unique: 6

Starting with an empty set, we add users from three different sources. The update method accepts any iterable (list, tuple, or set), and automatically deduplicates. Bob appears in two sources but only once in the result.

Intersection Update Filter

Intersection update (&=) keeps only elements that are in both sets. This is useful for progressively narrowing down a set based on multiple criteria.

	# Start with all products
	products = {"laptop", "phone", "tablet", "watch", "headphones", "camera"}

	# Filter to electronics under 500 dollars
	under_500 = {"phone", "watch", "headphones"}
	products &= under_500
	print("Under 500:", products)

	# Filter to items in stock
	in_stock = {"phone", "headphones", "cable"}
	products &= in_stock
	print("Under 500 AND in stock:", products)

>>>Output

Under 500: {'phone', 'watch', 'headphones'}

Under 500 AND in stock: {'phone', 'headphones'}

TIP

In-place operations modify the original set. If you need to preserve the original, either make a copy first with .copy(), or use the regular (non-in-place) operations which return new sets.

In-Place vs Regular Ops

Understanding the difference between in-place and regular operations is crucial. Regular operations leave originals unchanged and return a new set. In-place operations modify the original and return None.

	original = {1, 2, 3}
	addition = {3, 4, 5}

	# Returns new set, original unchanged
	new_set = original.union(addition)
	print("New set:", new_set)
	print("Original after union():", original)

	# In-place: modifies original
	result = original.update(addition)
	print("update() returns:", result)
	print("Original after update():", original)

>>>Output

New set: {1, 2, 3, 4, 5}

Original after union(): {1, 2, 3}

update() returns: None

Original after update(): {1, 2, 3, 4, 5}

After the union() call, original is still {1, 2, 3}. After the update() call, original has been modified to {1, 2, 3, 4, 5}. Note that update() returns None, not the modified set, so you cannot chain it like new = original.update(addition).

The code below has a bug caused by using the wrong operator. Can you spot and fix the error?

Debug Challenge

> This code uses the ^= augmented assignment operator inside an expression, which is a syntax error. The regular ^ operator should be used instead.

SyntaxError: invalid syntax with ^= in expression

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99




a = {1, 2, 3, 4}
b = {3, 4, 5}
result = a ^= b
print(result)
a = {1, 2, 3, 4}
b = {3, 4, 5}
result = a ^= b
print(result)

In-place set operators (|=, &=, -=, ^=) modify the set they are called on. They cannot be used in the middle of a larger expression or on the right-hand side of an assignment, because they return None rather than a new set value.

Regular set operators (|, &, -, ^) always return a new set and leave both operands unchanged. Use them whenever you need the result as a value or want to preserve the originals for further comparisons.

Set operations offer elegant solutions to common data comparison and deduplication problems. Put these techniques to the test with hands-on challenges in the Python Builder.

❯❯❯PUTTING IT ALL TOGETHER

> You are a data engineer at Spotify comparing listener sets across three regional platforms to find shared audiences for cross-promotion, identify platform-exclusive subscribers, and efficiently update running audience sets in place as new subscriber data streams in.

union() combines all three platform listener sets into one deduplicated master audience for broad cross-promotion targeting.

intersection() finds the subset of listeners present on all three platforms simultaneously, the highest-value cross-promotion targets.

difference() isolates subscribers unique to one platform, revealing the exclusive audience that has never been reached on the others.

In-place update with |= adds new arriving subscriber IDs directly into the running platform set without creating a new object.

KEY TAKEAWAYS

Union (| or .union()): Combines all elements from all sets

Intersection (& or .intersection()): Elements present in ALL sets

Difference (- or .difference()): Elements in first set but not in second

Symmetric Difference (^ or .symmetric_difference()): Elements in either set but not both

Operators require sets on both sides; methods accept any iterable

.issubset() / <=: Check if all elements are contained in another set

.issuperset() / >=: Check if set contains all elements of another

.isdisjoint(): Check if sets have no elements in common

Use |=, &=, -=, ^= for in-place modifications

Union and intersection are commutative; difference is not

Combining and comparing collections

Category: Python
Difficulty: intermediate
Duration: 44 minutes
Challenges: 3 hands-on challenges

Topics covered: Union: Combining Sets, Intersection: Finding Common Elements, Difference: Elements Unique to One Set, Subset and Superset

Lesson Sections

Union: Combining Sets (concepts: pySetOperations)
A union combines all elements from two or more sets into a single set. If an element appears in any of the input sets, it appears in the union exactly once. The union operation automatically handles duplicates because the result is still a set, which by definition contains only unique elements. This makes union perfect for merging data from multiple sources. The mathematical notation for union is A ∪ B, read as "A union B". The union of sets A and B contains every element that is in A, in B, or
Intersection: Finding Common Elements (concepts: pySetOperations)
An intersection finds elements that exist in all specified sets. If an element is in set A AND in set B, it appears in the intersection. Elements that are in only one set are excluded. The intersection operation answers the question "what do these sets have in common?" This is fundamental for finding overlaps, shared characteristics, or common attributes. The mathematical notation for intersection is A ∩ B, read as "A intersect B". The intersection of sets A and B contains only elements that are
Difference: Elements Unique to One Set (concepts: pySetOperations)
The difference of two sets returns elements that are in the first set but not in the second. This operation answers the question "what is in A that is not in B?" Unlike union and intersection, difference is not symmetric: A - B gives different results than B - A. The order matters because you are asking a directional question. Think of difference as starting with all elements of the first set, then removing any element that also appears in the second set. What remains are elements unique to the
Subset and Superset (concepts: pySetOperations)
Beyond combining sets, you often need to check if one set is contained within another. These containment relationships are called subset and superset. A subset is a set where every element exists in another larger set. A superset is the opposite: it contains all elements of a smaller set plus possibly more. Subset and superset checks are fundamental for validation, permission checking, and hierarchical data. For example, checking if a user has required permissions (user permissions should be a s

	users = {"alice", "bob", "charlie"}
	empty = set()

	print("Users \| empty:", users \| empty)
	print("Empty \| users:", empty \| users)
	print("Empty \| empty:", empty \| empty)