Sets: Intermediate

Dropbox syncs billions of files across hundreds of millions of devices, and at the core of deciding what needs to be uploaded is a set operation: comparing the set of file hashes already stored in the cloud against the set of file hashes on your local machine. The difference of those two sets tells Dropbox exactly which files are new and need uploading, without scanning file contents or re-transferring anything that is already synced. Set comprehensions, frozensets, and the symmetric difference operator you will learn in this lesson are the same algebraic tools that content deduplication systems use at petabyte scale to move only the data that actually changed.

Union: Combining Sets

Daily Life
Interviews

Merge sets with union and pipe

A union combines all elements from two or more sets into a single set. If an element appears in any of the input sets, it appears in the union exactly once. The union operation automatically handles duplicates because the result is still a set, which by definition contains only unique elements. This makes union perfect for merging data from multiple sources.
The mathematical notation for union is A ∪ B, read as "A union B". The union of sets A and B contains every element that is in A, in B, or in both. The key insight is that union is an inclusive operation: if you are in either set, you are in the result. This is analogous to the logical OR operation: an element is in the union if it is in A OR in B OR in both.
01
Inclusive
Contains every element that is in A, in B, or in both sets
02
Notation: A ∪ B
Written as A union B in mathematical set theory notation
03
Lower bound
Result is always at least as large as the largest input set
04
Upper bound
At most the sum of both set sizes when there is zero overlap
1frontend_skills = {"html", "css", "javascript", "react"}
2backend_skills = {"python", "sql", "javascript", "django"}
3
4# Union: all skills from both sets
5all_skills = frontend_skills.union(backend_skills)
6print("All skills:", all_skills)
7print("Count:", len(all_skills))
>>>Output
All skills: {'html', 'css', 'javascript', 'react', 'python', 'sql', 'django'}
Count: 7
Notice that "javascript" appears in both the frontend and backend sets but only once in the result. The union contains seven elements, not eight, because duplicates are automatically eliminated. This is exactly what you want when combining data sources: a complete list without repetition.

The Pipe Operator |

Python provides the | operator as a shorthand for union. This pipe symbol is often preferred for its concise syntax and resemblance to mathematical notation. In many programming contexts, the pipe symbol represents "or", which aligns with the inclusive nature of union: an element is in the result if it is in set A | (or) in set B.

1frontend_skills = {"html", "css", "javascript"}
2backend_skills = {"python", "sql", "javascript"}
3
4# These produce identical results:
5method_result = frontend_skills.union(backend_skills)
6operator_result = frontend_skills | backend_skills
7
8print("Method result:", method_result)
9print("Operator result:", operator_result)
10print("Equal?", method_result == operator_result)
>>>Output
Method result: {'html', 'css', 'javascript', 'python', 'sql'}
Operator result: {'html', 'css', 'javascript', 'python', 'sql'}
Equal? True
Both approaches produce identical results. The operator syntax is more concise and often preferred when both operands are already sets. However, there is an important difference between the method and operator forms that affects how you use them with other data types.
.union() Method
  • Method syntax with parentheses
  • Works with any iterable (list, tuple)
  • a.union([1, 2, 3]) works directly
  • More flexible for mixed types
| Operator
  • Operator syntax, more concise
  • Requires sets on both sides
  • a | [1, 2, 3] raises TypeError
  • Must convert to set first

The method form is more flexible because it accepts any iterable as an argument. If you have a list, tuple, or generator, you can pass it directly to the .union() method without first converting it to a set. The operator form requires both operands to be sets, so you must explicitly convert other types before using the | operator.

1current_users = {"alice", "bob", "charlie"}
2new_signups = ["diana", "eve", "bob"]
3
4# Method works with the list directly
5all_users = current_users.union(new_signups)
6print("With method:", all_users)
7
8# Operator requires converting to set first
9all_users2 = current_users | set(new_signups)
10print("With operator:", all_users2)
>>>Output
With method: {'alice', 'bob', 'charlie', 'diana', 'eve'}
With operator: {'alice', 'bob', 'charlie', 'diana', 'eve'}

Chaining Multiple Unions

You can union more than two sets at once by chaining the operator or passing multiple arguments to the method. Both approaches combine all unique elements from all input sets into a single result. This is essential when you need to merge data from three or more sources.
1team_a = {"python", "sql"}
2team_b = {"javascript", "python"}
3team_c = {"rust", "sql", "go"}
4team_d = {"java", "python"}
5
6# Chain operators to union four sets
7all_skills = team_a | team_b | team_c | team_d
8print("All skills:", all_skills)
9
10# Or use method with multiple arguments
11all_skills2 = team_a.union(team_b, team_c, team_d)
12print("Same result:", all_skills2)
>>>Output
All skills: {'python', 'sql', 'javascript', 'rust', 'go', 'java'}
Same result: {'python', 'sql', 'javascript', 'rust', 'go', 'java'}
Python appears in three of the four sets but only once in the result. SQL appears in two sets but only once in the result. The union correctly consolidates all unique skills across all teams, making it trivial to answer "what skills does our organization have?"

Merging Permissions

Union is commonly used in permission systems where a user belongs to multiple groups. Each group grants certain permissions, and the user should have the combined permissions from all their groups. This is a classic use case that appears in operating systems, web applications, and databases.
1# Permission sets for different roles
2viewer_perms = {"read", "list", "search"}
3editor_perms = {"read", "write", "edit", "list"}
4admin_perms = {"read", "write", "delete", "admin", "list"}
5
6# A user who is both an editor and has some admin rights
7user_groups = [editor_perms, {"delete"}]
8
9# Calculate effective permissions
10effective_perms = set()
11for group in user_groups:
12 effective_perms = effective_perms | group
13
14print("User can:", effective_perms)
>>>Output
User can: {'read', 'write', 'edit', 'list', 'delete'}
The user gets all permissions from their editor role plus the delete permission. Union ensures no duplicate permissions and provides a clear, complete set of what the user can do. This pattern scales to any number of groups or roles without changing the logic.

Union with Empty Sets

The empty set is the identity element for union: unioning any set with an empty set returns the original set unchanged. This may seem obvious, but it is an important property that makes your code robust when handling edge cases where one of your data sources might be empty.
1users = {"alice", "bob", "charlie"}
2empty = set()
3
4print("Users | empty:", users | empty)
5print("Empty | users:", empty | users)
6print("Empty | empty:", empty | empty)
>>>Output
Users | empty: {'alice', 'bob', 'charlie'}
Empty | users: {'alice', 'bob', 'charlie'}
Empty | empty: set()
This identity property means you can safely union sets without checking if they are empty first. Your code works correctly regardless of whether any input set happens to be empty.
TIP
Union is commutative (A | B equals B | A) and associative ((A | B) | C equals A | (B | C)). This means you can reorder union operations freely without changing the result.
Try choosing the right method below to combine two sets of user roles into a single set of all permissions.
Fill in the Blank

> Two teams have overlapping members: admins are {"alice", "bob"} and editors are {"bob", "charlie"}. Pick a set operation to produce the combined staff list.

admins = {{"alice", "bob"}}
editors = {{"bob", "charlie"}}
all_staff = admins.(editors)
print(all_staff)
Union is the most inclusive set operation: every element from every input set appears exactly once in the result. It is the right operation when you need a complete combined view without worrying about overlap.
The three set operations, union, intersection, and difference, each answer a different question about two collections. Union asks "what is in either?", intersection asks "what is in both?", and difference asks "what is in one but not the other?"
TIP
Use .union() or the | operator interchangeably. The method form accepts non-set iterables directly: a.union(my_list) works without converting the list first, which can make code cleaner when combining collections of mixed types.

Intersection: Finding Common Elements

Daily Life
Interviews

Find shared elements with intersection

An intersection finds elements that exist in all specified sets. If an element is in set A AND in set B, it appears in the intersection. Elements that are in only one set are excluded. The intersection operation answers the question "what do these sets have in common?" This is fundamental for finding overlaps, shared characteristics, or common attributes.
The mathematical notation for intersection is A ∩ B, read as "A intersect B". The intersection of sets A and B contains only elements that are in both A and B simultaneously. This is analogous to the logical AND operation: an element is in the intersection only if it is in A AND in B. The intersection is always smaller than or equal to the smallest input set.
Both sets required
Both sets required
Contains only elements present in A and B simultaneously
Notation: A ∩ B
Notation: A ∩ B
Read as "A intersect B" in mathematical set notation
Can be empty
Can be empty
Returns empty set when A and B share no common elements
Size limited
Size limited
Result is at most as large as the smaller input set
1frontend_skills = {"html", "css", "javascript", "react"}
2backend_skills = {"python", "sql", "javascript", "node"}
3
4# Intersection: skills in BOTH sets
5common_skills = frontend_skills.intersection(backend_skills)
6print("Common skills:", common_skills)
7
8# What percentage overlap?
9overlap_pct = len(common_skills) / len(frontend_skills | backend_skills) * 100
10print(f"Overlap: {overlap_pct:.1f}%")
>>>Output
Common skills: {'javascript'}
Overlap: 14.3%
Only "javascript" appears in both the frontend and backend skill sets, so the intersection contains just that one element. The overlap percentage shows how much the two sets have in common relative to their combined unique elements.

The Ampersand Operator &

Python provides the & operator as a shorthand for intersection. The ampersand symbol is borrowed from the logical AND operation, which is fitting because intersection returns elements that are in A AND in B. Just like the and keyword requires both conditions to be true, the intersection requires an element to be in both sets.

1a = {1, 2, 3, 4, 5, 6}
2b = {4, 5, 6, 7, 8, 9}
3
4# These produce identical results:
5method_result = a.intersection(b)
6operator_result = a & b
7
8print("Method:", method_result)
9print("Operator:", operator_result)
10print("Equal?", method_result == operator_result)
>>>Output
Method: {4, 5, 6}
Operator: {4, 5, 6}
Equal? True

As with union, the .intersection() method form accepts any iterable while the & operator requires both sides to be sets. Choose the method when working with lists or other iterables, and the operator for concise code when both operands are already sets.

Overlap Across Sets

When intersecting multiple sets, only elements present in ALL sets are included in the result. This becomes increasingly restrictive as you add more sets: an element must pass through every filter to appear in the final intersection.
1# Skills across three different teams
2team_a = {"python", "sql", "aws", "docker", "linux"}
3team_b = {"python", "java", "aws", "kubernetes", "linux"}
4team_c = {"python", "aws", "terraform", "docker", "linux"}
5
6# Skills that ALL three teams have
7shared_skills = team_a & team_b & team_c
8print("All teams know:", shared_skills)
9
10# Skills that at least two teams share
11print("A & B:", team_a & team_b)
12print("A & C:", team_a & team_c)
13print("B & C:", team_b & team_c)
>>>Output
All teams know: {'python', 'aws', 'linux'}
A & B: {'python', 'aws', 'linux'}
A & C: {'python', 'aws', 'docker', 'linux'}
B & C: {'python', 'aws', 'linux'}
Only "python", "aws", and "linux" appear in all three sets. "docker" appears in teams A and C but not B, so it is excluded from the three-way intersection. Notice how the intersection shrinks or stays the same as you add more sets to intersect.

Finding Common Customers

Intersection is invaluable for identifying overlap between customer segments. This helps with targeting campaigns to engaged users, finding cross-sell opportunities, or analyzing customer behavior across different touchpoints.
1newsletter_subscribers = {"alice", "bob", "charlie", "diana", "eve"}
2recent_purchasers = {"bob", "diana", "frank", "grace"}
3mobile_app_users = {"bob", "diana", "eve", "henry"}
4
5# Customers who are engaged across all three channels
6highly_engaged = newsletter_subscribers & recent_purchasers & mobile_app_users
7print("Highly engaged:", highly_engaged)
8
9# Customers who subscribe AND purchased (two conditions)
10converted_subscribers = newsletter_subscribers & recent_purchasers
11print("Converted subscribers:", converted_subscribers)
>>>Output
Highly engaged: {'bob', 'diana'}
Converted subscribers: {'bob', 'diana'}
Bob and Diana are the most engaged customers, appearing in all three segments. These highly engaged customers might receive special offers or be candidates for a loyalty program. The intersection makes this analysis trivial.

Intersection: Empty Sets

The empty set is the annihilator for intersection: intersecting any set with an empty set always returns an empty set. This makes sense logically: if one set has no elements, there can be no elements that are in both sets.
1users = {"alice", "bob", "charlie"}
2empty = set()
3
4print("Users & empty:", users & empty)
5print("Empty & users:", empty & users)
6
7# Empty set in chain = empty result
8a = {1, 2, 3}
9b = set()
10c = {2, 3, 4}
11print("a & b & c:", a & b & c)
>>>Output
Users & empty: set()
Empty & users: set()
a & b & c: set()
This property means that if any set in a multi-set intersection is empty, the entire result is empty. Be aware of this when debugging: if you expect results but get an empty set, check if any of your input sets might be empty.

Difference: Elements Unique to One Set

Daily Life
Interviews

Isolate unique elements per set

The difference of two sets returns elements that are in the first set but not in the second. This operation answers the question "what is in A that is not in B?" Unlike union and intersection, difference is not symmetric: A - B gives different results than B - A. The order matters because you are asking a directional question.
Think of difference as starting with all elements of the first set, then removing any element that also appears in the second set. What remains are elements unique to the first set. This is extremely useful for finding what is new, what is missing, what was added, or what was removed.
A - BA ⊆ BDisjoint
A - B
Subtract B
Elements in A but not B
A ⊆ B
Empty result
Empty if A is subset of B
Disjoint
Full result
Returns all of A if no overlap
1all_employees = {"alice", "bob", "charlie", "diana", "eve", "frank"}
2on_vacation = {"bob", "diana"}
3remote_today = {"charlie", "eve"}
4
5# Who is in the office today?
6in_office = all_employees - on_vacation - remote_today
7print("In office:", in_office)
8
9# Alternative: use .difference() method
10in_office2 = all_employees.difference(on_vacation, remote_today)
11print("Same result:", in_office2)
>>>Output
In office: {'alice', 'frank'}
Same result: {'alice', 'frank'}
Starting with all employees, we first remove those on vacation, then remove those working remotely. The result shows who is physically in the office. This kind of filtering is natural with set difference and would be more complex with lists.

Order: A - B vs B - A

Unlike union and intersection which are commutative (A op B equals B op A), difference is NOT commutative. The order of operands changes the result completely. This asymmetry is intentional: you are asking "what is in the first set that is not in the second" which is inherently directional.
1a = {1, 2, 3, 4, 5}
2b = {4, 5, 6, 7, 8}
3
4# These give completely different results
5a_minus_b = a - b
6b_minus_a = b - a
7
8print("a - b:", a_minus_b)
9print("b - a:", b_minus_a)
10print("Equal?", a_minus_b == b_minus_a)
>>>Output
a - b: {1, 2, 3}
b - a: {6, 7, 8}
Equal? False
a - b gives elements in a but not in b: 1, 2, and 3. These are what make set a unique. b - a gives elements in b but not in a: 6, 7, and 8. These are what make set b unique. The shared elements (4 and 5) appear in neither result.
A - B
  • Elements unique to A
  • What A has that B lacks
  • What to add to B to include A
  • Order: first minus second
B - A
  • Elements unique to B
  • What B has that A lacks
  • What to add to A to include B
  • Order: first minus second

Chaining Set Differences

You can chain multiple difference operations to remove elements from several sets. Each difference operation removes another layer of elements. This is useful when you have multiple exclusion criteria.
1all_tasks = {"design", "code", "test", "deploy", "document", "review"}
2completed = {"design", "code"}
3blocked = {"deploy"}
4assigned_to_others = {"review"}
5
6# Tasks I can work on right now
7my_actionable = all_tasks - completed - blocked - assigned_to_others
8print("I can work on:", my_actionable)
9
10# Same with method (accepts multiple arguments)
11my_actionable2 = all_tasks.difference(completed, blocked, assigned_to_others)
12print("Same result:", my_actionable2)
>>>Output
I can work on: {'test', 'document'}
Same result: {'test', 'document'}
Starting with all tasks, we progressively filter out completed tasks, blocked tasks, and tasks assigned to others. What remains are tasks that are neither done, blocked, nor owned by someone else: the tasks you can actually work on.

New vs Churned Users

Difference is perfect for comparing snapshots over time. By comparing user sets from different time periods, you can identify new acquisitions, retained users, and churned users. This is fundamental to cohort analysis and understanding user lifecycle.
1users_last_month = {"alice", "bob", "charlie", "diana"}
2users_this_month = {"bob", "charlie", "eve", "frank", "grace"}
3
4# New users: this month but not last month
5new_users = users_this_month - users_last_month
6print("New users:", new_users)
7
8# Churned users: last month but not this month
9churned_users = users_last_month - users_this_month
10print("Churned users:", churned_users)
11
12# Retained users: both months (this is intersection)
13retained = users_this_month & users_last_month
14print("Retained users:", retained)
15
16# Metrics
17print(f"Retention rate: {len(retained)/len(users_last_month)*100:.0f}%")
18print(f"Churn rate: {len(churned_users)/len(users_last_month)*100:.0f}%")
>>>Output
New users: {'eve', 'frank', 'grace'}
Churned users: {'alice', 'diana'}
Retained users: {'bob', 'charlie'}
Retention rate: 50%
Churn rate: 50%
This pattern reveals the complete user lifecycle: Eve, Frank, and Grace are new acquisitions. Alice and Diana churned. Bob and Charlie were retained. With just three set operations, you have comprehensive user lifecycle metrics.

Symmetric Difference

The symmetric difference contains elements that are in either set but NOT in both. Think of it as the opposite of intersection: instead of finding what sets share, you find what makes each set unique. If an element appears in both sets, it is excluded from the symmetric difference.
Mathematically, symmetric difference is equivalent to two other expressions: it equals (A - B) union (B - A), which is the elements unique to A combined with elements unique to B. It also equals (A union B) - (A intersection B), which is everything in either set minus what they share. All three formulations give the same result.
01
XOR logic
A ^ B contains elements in A or B, but not in both sets
02
Equivalent form 1
Same as (A - B) union (B - A): unique elements from each side
03
Equivalent form 2
Same as (A | B) - (A & B): everything minus the overlap
04
Commutative
Unlike difference, A ^ B always equals B ^ A
1a = {1, 2, 3, 4, 5}
2b = {4, 5, 6, 7, 8}
3
4sym_diff = a.symmetric_difference(b)
5print("Symmetric difference:", sym_diff)
6
7# Using the ^ operator
8sym_diff2 = a ^ b
9print("Using ^ operator:", sym_diff2)
10
11alt1 = (a - b) | (b - a)
12alt2 = (a | b) - (a & b)
13print("(a-b)|(b-a):", alt1)
14print("(a|b)-(a&b):", alt2)
>>>Output
Symmetric difference: {1, 2, 3, 6, 7, 8}
Using ^ operator: {1, 2, 3, 6, 7, 8}
(a-b)|(b-a): {1, 2, 3, 6, 7, 8}
(a|b)-(a&b): {1, 2, 3, 6, 7, 8}
4 and 5 are excluded because they appear in both sets (they are the intersection). 1, 2, and 3 are unique to set a. 6, 7, and 8 are unique to set b. The symmetric difference contains all six of these unique elements. All four formulations produce the same result.

The Caret Operator ^

Python uses ^ (caret) for symmetric difference. This operator is borrowed from the bitwise XOR (exclusive or) operation. In boolean logic, XOR returns true when exactly one of two inputs is true, but not when both are true. This perfectly matches symmetric difference: an element is included when it is in exactly one set, but not when it is in both.

Intersection (AND)
  • Elements in BOTH sets
  • Operator: &
  • Like logical AND
  • Finds shared elements
Symmetric Diff (XOR)
  • Elements in EITHER but not BOTH
  • Operator: ^
  • Like logical XOR
  • Finds unique elements

The relationship between intersection and symmetric difference is complementary. Together they partition the union: every element in (A | B) is in either (A & B) or (A ^ B), but never both. If you know intersection and union, you can compute symmetric difference, and vice versa.

Detecting Changes Example

Symmetric difference excels at detecting what changed between two states. Since it excludes elements that stayed the same (elements in both sets), it highlights only the additions and removals. This is invaluable for configuration management, version comparison, and change detection.
1# Configuration at two different times
2config_v1 = {"debug_mode", "cache_enabled", "log_level_info", "feature_a"}
3config_v2 = {"cache_enabled", "log_level_debug", "feature_a", "feature_b"}
4
5# What settings changed?
6changes = config_v1 ^ config_v2
7print("Changed settings:", changes)
8
9# Breaking down the changes
10removed = config_v1 - config_v2
11added = config_v2 - config_v1
12print("Removed:", removed)
13print("Added:", added)
>>>Output
Changed settings: {'debug_mode', 'log_level_info', 'log_level_debug', 'feature_b'}
Removed: {'debug_mode', 'log_level_info'}
Added: {'log_level_debug', 'feature_b'}
The symmetric difference immediately shows what changed. Settings that remained the same (cache_enabled, feature_a) are excluded. If you need to know specifically what was added versus removed, use regular difference in both directions.

Symmetric Diff: Commutative

Unlike regular difference, symmetric difference is commutative: A ^ B always equals B ^ A. This makes sense because we are finding elements unique to either side, which is the same regardless of which set we consider "first".

1a = {1, 2, 3}
2b = {3, 4, 5}
3
4print("a ^ b:", a ^ b)
5print("b ^ a:", b ^ a)
6print("Equal?", (a ^ b) == (b ^ a))
>>>Output
a ^ b: {1, 2, 4, 5}
b ^ a: {1, 2, 4, 5}
Equal? True
Try each set operator below to see how the same two sets produce completely different results depending on the operation.
Fill in the Blank

> Two sets a = {1, 2, 3, 4} and b = {3, 4, 5, 6} overlap on some elements. Pick a set operator to see how union, intersection, difference, and symmetric difference each produce a different result.

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
print(a  b)

Operation Summary

Understanding when to use each operation is essential for effective data processing. Here is a comprehensive reference guide that summarizes all four operations with their methods, operators, and typical use cases.
Union  |  .union()
Union | .union()
Combine everything from all sets into one collection
Intersection  &
Intersection &
Find elements that all sets share in common
Difference  -
Difference -
Find elements in the first set but not in the second
Symmetric Diff  ^
Symmetric Diff ^
Find elements that are unique to each set, not shared
1a = {1, 2, 3, 4}
2b = {3, 4, 5, 6}
3
4print("Union (a | b):", a | b)
5print("Intersection (a & b):", a & b)
6print("Difference (a - b):", a - b)
7print("Difference (b - a):", b - a)
8print("Symmetric Diff (a ^ b):", a ^ b)
>>>Output
Union (a | b): {1, 2, 3, 4, 5, 6}
Intersection (a & b): {3, 4}
Difference (a - b): {1, 2}
Difference (b - a): {5, 6}
Symmetric Diff (a ^ b): {1, 2, 5, 6}

Methods vs Operators

Each operation has both a method form and an operator form. The key difference is that methods accept any iterable (lists, tuples, generators), while operators require set operands on both sides. Choose based on your data types and readability preferences.
1current_set = {1, 2, 3}
2new_items = [3, 4, 5]
3
4# Method works directly with lists
5result = current_set.union(new_items)
6print("Method with list:", result)
7
8# Operator requires conversion to set
9result2 = current_set | set(new_items)
10print("Operator with set:", result2)
11
12a = {1, 2}
13result3 = a.union([3, 4], (5, 6), {7, 8})
14print("Multiple iterables:", result3)
>>>Output
Method with list: {1, 2, 3, 4, 5}
Operator with set: {1, 2, 3, 4, 5}
Multiple iterables: {1, 2, 3, 4, 5, 6, 7, 8}
Use operators for concise, readable code when both operands are already sets. Use methods when working with lists, tuples, or other iterables, or when you need to combine more than two collections in a single call.
Choosing Method vs Operator
  • Use the operator (|, &, -, ^) when both sides are already sets
  • Use the method (.union(), .intersection()) when one side is a list or tuple
  • Methods accept multiple arguments: a.union(b, c, d) works in one call
  • Operators chain naturally: a | b | c reads like mathematical notation
  • Methods are more explicit; operators are more concise
The code below tries to find new users but has the operands reversed. Fix the direction of the difference operation.
Debug Challenge

> This code computes last_month - this_month, which finds churned users instead of new users. The set difference operands are reversed.

Logic error: shows churned users {'alice'} instead of new users {'charlie', 'diana'}

Set difference is directional: A - B and B - A produce different results. Always read it as "what is in the first set that is NOT in the second set." Getting the operand order right is the most common source of set difference bugs.

The |, &, -, and ^ operators map directly to union, intersection, difference, and symmetric difference. Using these single-character operators makes set algebra in code read closely to the mathematical notation you would write on paper.

TIP
When you use the - operator to find items in one set but not another, label your variables clearly so the direction is obvious. Names like new_users = this_month - last_month make the intent readable without needing a comment.

Subset and Superset

Daily Life
Interviews

Validate containment and modify in place

Beyond combining sets, you often need to check if one set is contained within another. These containment relationships are called subset and superset. A subset is a set where every element exists in another larger set. A superset is the opposite: it contains all elements of a smaller set plus possibly more.
Subset and superset checks are fundamental for validation, permission checking, and hierarchical data. For example, checking if a user has required permissions (user permissions should be a superset of required permissions), or validating that input is within allowed values (input should be a subset of allowed values).
A ⊆ BB ⊇ AA ⊂ BB ⊃ A
A ⊆ B
Subset
Every element of A is in B
B ⊇ A
Superset
B contains all of A plus more
A ⊂ B
Proper subset
A is in B but not equal to B
B ⊃ A
Proper superset
B has extra elements beyond A
1basic_permissions = {"read"}
2editor_permissions = {"read", "write"}
3admin_permissions = {"read", "write", "delete", "admin"}
4
5# Check subset relationships
6print("basic ⊆ editor?", basic_permissions.issubset(editor_permissions))
7print("editor ⊆ admin?", editor_permissions.issubset(admin_permissions))
8print("admin ⊆ basic?", admin_permissions.issubset(basic_permissions))
9
10# Check superset relationships
11print("admin ⊇ editor?", admin_permissions.issuperset(editor_permissions))
12print("editor ⊇ basic?", editor_permissions.issuperset(basic_permissions))
>>>Output
basic ⊆ editor? True
editor ⊆ admin? True
admin ⊆ basic? False
admin ⊇ editor? True
editor ⊇ basic? True
The permission sets form a hierarchy: basic is a subset of editor, which is a subset of admin. This reflects the real permission structure where higher roles include all lower permissions plus additional ones.

The Comparison Operators

Python provides comparison operators for subset and superset checks: <= for subset (less than or equal to) and >= for superset (greater than or equal to). The intuition is that a "smaller" set is one contained within a "larger" set.

1a = {1, 2}
2b = {1, 2, 3, 4}
3c = {1, 2}
4
5# Subset checks
6print("a <= b (subset):", a <= b)
7print("a < b (proper subset):", a < b)
8
9# Note: a and c are equal, so...
10print("a <= c:", a <= c)
11print("a < c:", a < c)
12
13# Superset checks
14print("b >= a:", b >= a)
15print("b > a:", b > a)
>>>Output
a <= b (subset): True
a < b (proper subset): True
a <= c: True
a < c: False
b >= a: True
b > a: True

A proper subset or superset means the sets are not equal. Set a is a subset of c (since they are equal), but not a proper subset. The strict operators (< and >) exclude the case where sets are equal, while the non-strict operators (<= and >=) include equality.

Validation with Subsets

Subset checking is perfect for validating that user input falls within allowed values, or that a required set of items is present in a larger collection.
1allowed_columns = {"id", "name", "email", "age", "city", "country"}
2
3# User requests certain columns
4user_request = {"name", "email", "city"}
5
6# Validate the request
7if user_request <= allowed_columns:
8 print("Valid request:", user_request)
9else:
10 invalid = user_request - allowed_columns
11 print("Invalid columns:", invalid)
12
13# Another request with invalid columns
14bad_request = {"name", "password", "ssn"}
15if bad_request <= allowed_columns:
16 print("Valid request")
17else:
18 invalid = bad_request - allowed_columns
19 print("Invalid columns:", invalid)
>>>Output
Valid request: {'name', 'email', 'city'}
Invalid columns: {'password', 'ssn'}

Checking for Disjoint Sets

Two sets are disjoint if they have no elements in common. The .isdisjoint() method returns True if the sets share no elements. This is equivalent to checking if the intersection is empty, but .isdisjoint() is more efficient because it can stop early as soon as it finds any common element.

1odds = {1, 3, 5, 7, 9}
2evens = {2, 4, 6, 8, 10}
3primes = {2, 3, 5, 7}
4composites = {4, 6, 8, 9, 10}
5
6print("odds and evens disjoint?", odds.isdisjoint(evens))
7print("odds and primes disjoint?", odds.isdisjoint(primes))
8print("primes and composites disjoint?", primes.isdisjoint(composites))
9
10# Equivalent to checking empty intersection
11print("Same as intersection check:", len(odds & evens) == 0)
>>>Output
odds and evens disjoint? True
odds and primes disjoint? False
primes and composites disjoint? True
Same as intersection check: True

Odd and even numbers are disjoint by definition. Odd numbers and primes share 3, 5, and 7. Primes and composites are disjoint because no number can be both prime and composite. The isdisjoint() method efficiently tells you whether any overlap exists.

Here is a quick reference for the relationship-checking methods and their operator equivalents.
Set Relationship Methods
  • .issubset() or <= -- every element of A is also in B
  • .issuperset() or >= -- A contains every element of B
  • < proper subset -- A is inside B and they are not equal
  • > proper superset -- A contains B and has extra elements
  • .isdisjoint() -- A and B share zero common elements

In-Place Operations

All set operations covered so far create new sets, leaving the originals unchanged. This is often what you want, but sometimes you need to modify a set in place for efficiency or because you want to accumulate changes. Python provides in-place versions of all four fundamental operations using update methods or augmented assignment operators.
In-place operations are more memory efficient because they do not create a new set object. For very large sets, this can be significant. However, they modify the original data, which means you lose the original state. Choose in-place operations when you no longer need the original data and want to save memory.
.update() or |=
.update() or |=
Add all elements from another set, performing union in place
.intersection_update() or &=
.intersection_update() or &=
Keep only elements common to both sets, discard the rest
.difference_update() or -=
.difference_update() or -=
Remove from this set any elements found in the other set
.symmetric_difference_update()
.symmetric_difference_update()
Keep only elements unique to each set, drop shared ones
1inventory = {"apple", "banana", "cherry"}
2print("Original:", inventory)
3
4# Add new items in place (union)
5inventory |= {"date", "elderberry"}
6print("After |=:", inventory)
7
8# Keep items in stock (intersection)
9in_stock = {"banana", "date", "fig"}
10inventory &= in_stock
11print("After &= in_stock:", inventory)
12
13# Remove recalled items (difference)
14inventory -= {"banana"}
15print("After -= recalled:", inventory)
>>>Output
Original: {'apple', 'banana', 'cherry'}
After |=: {'apple', 'banana', 'cherry', 'date', 'elderberry'}
After &= in_stock: {'banana', 'date'}
After -= recalled: {'date'}
Each operation modifies the inventory set directly. After all operations, only "date" remains. The original set is progressively transformed rather than replaced. This is efficient but means the original data is lost.

Accumulating Data: Update

The .update() method (or |= operator) is particularly useful for accumulating data from multiple sources into a single set. This is common when processing files, API responses, or database queries where data arrives in batches.

1all_users = set()
2
3source1_users = ["alice", "bob", "charlie"]
4source2_users = ("bob", "diana", "eve")
5source3_users = {"eve", "frank"}
6
7# Accumulate all users into one set
8all_users.update(source1_users)
9all_users.update(source2_users)
10all_users.update(source3_users)
11
12print("All unique users:", all_users)
13print("Total unique:", len(all_users))
>>>Output
All unique users: {'alice', 'bob', 'charlie', 'diana', 'eve', 'frank'}
Total unique: 6
Starting with an empty set, we add users from three different sources. The update method accepts any iterable (list, tuple, or set), and automatically deduplicates. Bob appears in two sources but only once in the result.

Intersection Update Filter

Intersection update (&=) keeps only elements that are in both sets. This is useful for progressively narrowing down a set based on multiple criteria.

1# Start with all products
2products = {"laptop", "phone", "tablet", "watch", "headphones", "camera"}
3
4# Filter to electronics under 500 dollars
5under_500 = {"phone", "watch", "headphones"}
6products &= under_500
7print("Under 500:", products)
8
9# Filter to items in stock
10in_stock = {"phone", "headphones", "cable"}
11products &= in_stock
12print("Under 500 AND in stock:", products)
>>>Output
Under 500: {'phone', 'watch', 'headphones'}
Under 500 AND in stock: {'phone', 'headphones'}
TIP
In-place operations modify the original set. If you need to preserve the original, either make a copy first with .copy(), or use the regular (non-in-place) operations which return new sets.

In-Place vs Regular Ops

Understanding the difference between in-place and regular operations is crucial. Regular operations leave originals unchanged and return a new set. In-place operations modify the original and return None.

1original = {1, 2, 3}
2addition = {3, 4, 5}
3
4# Returns new set, original unchanged
5new_set = original.union(addition)
6print("New set:", new_set)
7print("Original after union():", original)
8
9# In-place: modifies original
10result = original.update(addition)
11print("update() returns:", result)
12print("Original after update():", original)
>>>Output
New set: {1, 2, 3, 4, 5}
Original after union(): {1, 2, 3}
update() returns: None
Original after update(): {1, 2, 3, 4, 5}

After the union() call, original is still {1, 2, 3}. After the update() call, original has been modified to {1, 2, 3, 4, 5}. Note that update() returns None, not the modified set, so you cannot chain it like new = original.update(addition).

The code below has a bug caused by using the wrong operator. Can you spot and fix the error?
Debug Challenge

> This code uses the ^= augmented assignment operator inside an expression, which is a syntax error. The regular ^ operator should be used instead.

SyntaxError: invalid syntax with ^= in expression

In-place set operators (|=, &=, -=, ^=) modify the set they are called on. They cannot be used in the middle of a larger expression or on the right-hand side of an assignment, because they return None rather than a new set value.

Regular set operators (|, &, -, ^) always return a new set and leave both operands unchanged. Use them whenever you need the result as a value or want to preserve the originals for further comparisons.

Set operations offer elegant solutions to common data comparison and deduplication problems. Put these techniques to the test with hands-on challenges in the Python Builder.
PUTTING IT ALL TOGETHER

> You are a data engineer at Spotify comparing listener sets across three regional platforms to find shared audiences for cross-promotion, identify platform-exclusive subscribers, and efficiently update running audience sets in place as new subscriber data streams in.

union() combines all three platform listener sets into one deduplicated master audience for broad cross-promotion targeting.
intersection() finds the subset of listeners present on all three platforms simultaneously, the highest-value cross-promotion targets.
difference() isolates subscribers unique to one platform, revealing the exclusive audience that has never been reached on the others.
In-place update with |= adds new arriving subscriber IDs directly into the running platform set without creating a new object.
KEY TAKEAWAYS
Union (| or .union()): Combines all elements from all sets
Intersection (& or .intersection()): Elements present in ALL sets
Difference (- or .difference()): Elements in first set but not in second
Symmetric Difference (^ or .symmetric_difference()): Elements in either set but not both
Operators require sets on both sides; methods accept any iterable
.issubset() / <=: Check if all elements are contained in another set
.issuperset() / >=: Check if set contains all elements of another
.isdisjoint(): Check if sets have no elements in common
Use |=, &=, -=, ^= for in-place modifications
Union and intersection are commutative; difference is not

Combining and comparing collections

Category
Python
Difficulty
intermediate
Duration
44 minutes
Challenges
3 hands-on challenges

Topics covered: Union: Combining Sets, Intersection: Finding Common Elements, Difference: Elements Unique to One Set, Subset and Superset

Lesson Sections

  1. Union: Combining Sets (concepts: pySetOperations)

    A union combines all elements from two or more sets into a single set. If an element appears in any of the input sets, it appears in the union exactly once. The union operation automatically handles duplicates because the result is still a set, which by definition contains only unique elements. This makes union perfect for merging data from multiple sources. The mathematical notation for union is A ∪ B, read as "A union B". The union of sets A and B contains every element that is in A, in B, or

  2. Intersection: Finding Common Elements

    An intersection finds elements that exist in all specified sets. If an element is in set A AND in set B, it appears in the intersection. Elements that are in only one set are excluded. The intersection operation answers the question "what do these sets have in common?" This is fundamental for finding overlaps, shared characteristics, or common attributes. The mathematical notation for intersection is A ∩ B, read as "A intersect B". The intersection of sets A and B contains only elements that are

  3. Difference: Elements Unique to One Set

    The difference of two sets returns elements that are in the first set but not in the second. This operation answers the question "what is in A that is not in B?" Unlike union and intersection, difference is not symmetric: A - B gives different results than B - A. The order matters because you are asking a directional question. Think of difference as starting with all elements of the first set, then removing any element that also appears in the second set. What remains are elements unique to the

  4. Subset and Superset

    Beyond combining sets, you often need to check if one set is contained within another. These containment relationships are called subset and superset. A subset is a set where every element exists in another larger set. A superset is the opposite: it contains all elements of a smaller set plus possibly more. Subset and superset checks are fundamental for validation, permission checking, and hierarchical data. For example, checking if a user has required permissions (user permissions should be a s