Sets: Beginner

Slack computes shared channel membership between any two users in milliseconds by taking the intersection of their respective channel sets, a calculation that runs in constant time regardless of how many channels either person belongs to. With hundreds of millions of messages processed daily across millions of workspaces, doing this with list lookups would be impossibly slow, requiring linear scans for every comparison. Sets make the operation instant because membership testing is O(1), not O(n). The add, remove, and membership operations you will learn in this lesson are the same building blocks Slack and every major collaboration platform rely on to serve membership data at scale.

What is a Set?

Daily Life
Interviews

Distinguish sets from lists

A set is an unordered collection of unique elements. These two properties define what makes a set different from other collection types like lists and tuples. Understanding both properties is essential for using sets correctly.
The word "unordered" means that sets do not maintain any particular sequence for their elements. Unlike lists, where the first item you add stays first and the last item stays last, sets make no guarantees about element order. When you iterate over a set or print it, the elements might appear in any order. This order might even change between different runs of your program. You cannot rely on sets to preserve the order in which you added elements.
The word "unique" means that each element can appear at most once in a set. If you try to add a duplicate element, the set simply ignores it without raising an error. The set remains unchanged. This automatic duplicate handling is one of the most useful features of sets. You never need to check whether an element exists before adding it. You can add freely, and the set ensures uniqueness.
Unordered
Unordered
Elements have no defined position or index in a set
Unique
Unique
Each element can appear at most once in the collection
Mutable
Mutable
You can add and remove elements after creation
Fast lookup
Fast lookup
Checking membership is extremely efficient at O(1) time
Dynamic size
Dynamic size
Sets grow and shrink as you add and remove elements
Think of a set like a bag of marbles where each marble must be a different color. If you already have a red marble in the bag and try to add another red marble, the bag rejects it because red is already represented. You cannot ask for "the third marble" or "the marble at position five" because marbles in a bag have no order. But you can very quickly check "Is there a blue marble in the bag?" by reaching in and finding it almost instantly.
This analogy also illustrates why sets are useful. If you wanted to know how many different colors of marbles you have, a set gives you the answer directly: its length equals the number of unique colors. With a list, you would need to examine each marble and keep track of which colors you have already seen.

Sets vs Lists Fundamentals

Lists and sets are both collection types, but they serve fundamentally different purposes. Understanding when to use each is crucial for writing efficient, correct code. The wrong choice can lead to subtle bugs or severe performance problems.

Lists preserve order and allow duplicates. When you add items to a list, they stay in the order you added them. You can access items by their position using indexing: list[0] gives you the first item, list[1] gives you the second, and so on. Lists allow the same value to appear multiple times. A list like [1, 1, 2, 2, 3] is perfectly valid and maintains all five elements.

Sets ignore order and enforce uniqueness. When you add items to a set, they do not have positions. You cannot access set elements by index. Sets reject duplicate values automatically. A set created from {1, 1, 2, 2, 3} would contain only {1, 2, 3} because duplicates are eliminated.
List
  • Ordered: [1, 2, 3] stays in that order
  • Allows duplicates: [1, 1, 2, 2] is valid
  • Access by index: list[0] returns first item
  • Slower membership test: O(n) time complexity
  • Preserves insertion order
Set
  • Unordered: order is not guaranteed
  • No duplicates: {1, 2} only, no repeats
  • No indexing: set[0] raises an error
  • Fast membership test: O(1) time complexity
  • No concept of order

The notation O(n) means that checking if an item is in a list takes time proportional to the list size. If the list has n items, you might need to check all n items in the worst case. A list with a million items requires up to a million comparisons. The notation O(1) means that checking if an item is in a set takes constant time regardless of set size. Whether the set has ten items or ten million items, the lookup takes approximately the same amount of time.

This performance difference matters enormously in practice. If you need to check whether items exist in a collection thousands or millions of times, using a set instead of a list can reduce your runtime from hours to seconds. Many experienced programmers have debugged slow code only to discover that converting a list to a set solved the performance problem.
Python Quiz

> A set automatically removes duplicates. Pick the built-in that counts unique elements, and the keyword that tests membership in constant time.

data = {3, 1, 4, 1, 5, 9, 2, 6, 5}
print(___(data))
print(5 ___ data)
is
len
in
sum
max
Sets and lists are both collections, but they solve different problems. Use a list when you care about order or need to store duplicates. Use a set when you only care whether an item is present and want instant answers regardless of collection size.

The O(1) membership test is the defining advantage of sets. It comes from hashing: Python converts each element into a number that directly points to its storage location, so no scanning is needed. This makes sets the right tool for membership checks in any performance-sensitive code.

TIP
When you find yourself writing if item in my_list inside a loop, consider converting my_list to a set first. The lookup cost drops from O(n) to O(1), turning potentially slow code into fast code with a one-word change.

Creating Sets

Daily Life
Interviews

Build sets from any iterable

Python provides two main ways to create sets. You can use curly braces with elements inside, similar to how you write dictionary literals but without key-value pairs. Alternatively, you can use the set() constructor function, which can convert other iterables into sets. Each approach has specific use cases and limitations that you should understand.

Using Curly Braces

The most common and concise way to create a set with initial elements is using curly braces. Place your elements inside the braces, separated by commas. This syntax looks similar to dictionary syntax, but dictionaries have key-value pairs separated by colons, while sets contain only single values.
1# Create a set of colors
2colors = {"red", "green", "blue"}
3print(colors)
4print(type(colors))
5
6# Create a set of numbers
7numbers = {1, 2, 3, 4, 5}
8print(numbers)
9
10# Create a set with mixed types
11mixed = {42, "hello", 3.14, True}
12print(mixed)
>>>Output
{'red', 'green', 'blue'}
<class 'set'>
{1, 2, 3, 4, 5}
{True, 42, 3.14, 'hello'}
When you print a set, Python displays it with curly braces. Notice that the order of elements in the output may differ from the order you wrote them. In the mixed set example, the elements appear in a different order than we specified. This is completely normal behavior because sets are unordered. Do not write code that depends on any particular ordering of set elements.

The type() function confirms that these objects are sets. Python's set type is a built-in type, meaning it is always available without importing anything. Sets are as fundamental to Python as lists, dictionaries, and tuples.

The Empty Set Problem

There is one critical exception to the curly brace syntax that trips up many Python programmers. You cannot create an empty set with empty curly braces. When Python sees {}, it interprets this as an empty dictionary, not an empty set. This behavior exists for historical reasons: dictionaries were added to Python before sets, and {} was already established as the dictionary literal syntax.

1# Creates a DICTIONARY, not a set!
2not_a_set = {}
3print("Type of {}:", type(not_a_set))
4
5# This is how you create an empty SET
6empty_set = set()
7print("Type of set():", type(empty_set))
8
9empty_set.add("first element")
10print("After adding:", empty_set)
>>>Output
Type of {}: <class 'dict'>
Type of set(): <class 'set'>
After adding: {'first element'}

This quirk catches even experienced Python developers. If you write code that initializes a variable with {} and later tries to use set methods like add(), you will get an AttributeError because dictionaries do not have an add() method. The error message might be confusing because you thought you had a set.

TIP
Always use set() to create an empty set. Using {} creates an empty dictionary. This is one of the most common Python gotchas and has caused countless debugging sessions.

Sets from Other Collections

The set() constructor is versatile. It can convert any iterable object into a set. An iterable is anything you can loop over: lists, tuples, strings, ranges, and even other sets. This conversion automatically removes any duplicate values, which is often exactly what you want.

1# Convert a list with duplicates to a set
2numbers_list = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
3unique_numbers = set(numbers_list)
4print("Original list:", numbers_list)
5print("As a set:", unique_numbers)
6print("List length:", len(numbers_list))
7print("Set length:", len(unique_numbers))
>>>Output
Original list: [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
As a set: {1, 2, 3, 4}
List length: 10
Set length: 4
In this example, the list has ten elements but only four unique values. The set constructor processes each element from the list. When it encounters an element it has already seen, it simply skips it. The resulting set has length four because there are only four distinct values in the original list.
1# Convert a string to a set of characters
2word = "mississippi"
3letters = set(word)
4print("Word:", word)
5print("Unique letters:", letters)
6print("Total characters:", len(word))
7print("Unique characters:", len(letters))
>>>Output
Word: mississippi
Unique letters: {'m', 'i', 's', 'p'}
Total characters: 11
Unique characters: 4

Strings are iterable in Python, meaning you can loop over their characters. When you pass a string to set(), Python treats it as a sequence of characters and creates a set containing each unique character. The string "mississippi" has eleven characters total, but only four unique letters: m, i, s, and p. The set contains exactly these four characters.

This pattern of converting a collection to a set for deduplication is extremely common in data processing:
1# Real-world example: counting unique visitors
2visitor_log = [101, 102, 101, 103, 102, 101, 104, 103, 101, 105]
3unique_visitors = set(visitor_log)
4
5print("Total page views:", len(visitor_log))
6print("Unique visitors:", len(unique_visitors))
7print("Visitor IDs:", unique_visitors)
>>>Output
Total page views: 10
Unique visitors: 5
Visitor IDs: {101, 102, 103, 104, 105}
1# Converting tuples and ranges
2tuple_data = (1, 2, 2, 3)
3set_from_tuple = set(tuple_data)
4print("From tuple:", set_from_tuple)
5
6range_data = range(1, 10, 2)
7set_from_range = set(range_data)
8print("From range:", set_from_range)
>>>Output
From tuple: {1, 2, 3}
From range: {1, 3, 5, 7, 9}
Try choosing different constructors below to see how Python interprets each syntax for creating a set versus other collection types.
Fill in the Blank

> You have a list [3, 1, 2, 1] with a duplicate value. Pick a constructor to convert it and see how set, list, and tuple each handle duplicates and ordering differently.

result = ([3, 1, 2, 1])
print(type(result))
print(result)

The constructor you choose determines everything about the resulting collection: whether it keeps duplicates, whether it maintains order, and whether it supports fast membership checks. set() is unique in that it both removes duplicates and provides O(1) lookups.

A common pattern is to convert a list to a set and back: list(set(my_list)). This deduplicates a list in one step, though the output order may differ from the input since sets do not guarantee ordering.

TIP
Use set() with an empty argument to create an empty set, not {}. Curly braces alone create an empty dictionary in Python. Always write my_set = set() for an empty set.

Automatic Duplicate Removal

Daily Life
Interviews

Deduplicate data with add and update

The automatic duplicate removal behavior of sets is one of their most powerful and useful features. Sets eliminate duplicates both during creation and when adding new elements. This happens silently, without errors or warnings. Understanding this behavior allows you to write cleaner, more concise code.
1# Duplicates are automatically removed during creation
2votes = {"Alice", "Bob", "Alice", "Charlie", "Bob", "Alice"}
3print("Votes cast:", votes)
4print("Unique voters:", len(votes))
5
6# Even though we wrote Alice 3 times and Bob 2 times
7# The set contains each name exactly once
>>>Output
Votes cast: {'Alice', 'Bob', 'Charlie'}
Unique voters: 3
Even though we specified "Alice" three times and "Bob" twice in the set literal, the resulting set contains each name exactly once. Python processes the elements in order, adding each one to the set. When it encounters an element that already exists in the set, it simply skips it. This behavior is consistent and predictable.
This same deduplication happens when you add elements to an existing set. If you add an element that already exists, the set remains unchanged. No error is raised, and the size of the set does not increase. This allows you to add elements freely without first checking whether they exist.
1# Duplicates are also ignored when using add()
2colors = {"red", "blue"}
3print("Initial set:", colors)
4print("Initial size:", len(colors))
5
6colors.add("green")
7print("After adding green:", colors)
8print("Size:", len(colors))
9
10colors.add("red")
11print("After adding red again:", colors)
12print("Size:", len(colors))
>>>Output
Initial set: {'red', 'blue'}
Initial size: 2
After adding green: {'red', 'blue', 'green'}
Size: 3
After adding red again: {'red', 'blue', 'green'}
Size: 3

Counting Unique Values

One of the most common uses of sets is counting unique values in a dataset. Given any collection with potential duplicates, converting to a set and checking its length tells you how many distinct values exist. This operation is fast and memory-efficient.
1# Sample log of page views on a website
2page_views = [
3 "home", "about", "home", "contact", "home",
4 "products", "about", "home", "pricing", "about"
5]
6
7# How many unique pages were viewed?
8unique_pages = set(page_views)
9print("All page views:", page_views)
10print("Unique pages:", unique_pages)
11print("Total views:", len(page_views))
12print("Unique page count:", len(unique_pages))
>>>Output
All page views: ['home', 'about', 'home', 'contact', 'home', 'products', 'about', 'home', 'pricing', 'about']
Unique pages: {'home', 'about', 'contact', 'products', 'pricing'}
Total views: 10
Unique page count: 5
This pattern appears constantly in data analysis and processing. Given a column of data from a database or spreadsheet, you often need to know "How many distinct values are there?" Converting to a set and checking its length answers this question efficiently, regardless of how large the original dataset is.
1# Another example: analyzing survey responses
2responses = ["yes", "no", "yes", "maybe", "yes", "no", "yes", "maybe", "no"]
3
4unique_responses = set(responses)
5print("Possible responses:", unique_responses)
6print("Number of unique responses:", len(unique_responses))
7
8# You can also check the ratio of unique to total
9uniqueness_ratio = len(unique_responses) / len(responses)
10print(f"Uniqueness ratio: {uniqueness_ratio:.1%}")
>>>Output
Possible responses: {'yes', 'no', 'maybe'}
Number of unique responses: 3
Uniqueness ratio: 33.3%

Ordered Duplicate Removal

Because sets are unordered, converting a list to a set and back to a list loses the original order. Sometimes you need to remove duplicates while keeping the first occurrence of each item in its original position. This requires a different approach that uses a set for tracking but preserves order in a separate list.
1# Remove duplicates while preserving order
2items = ["apple", "banana", "apple", "cherry", "banana", "date", "apple"]
3
4# Method 1: Using a set to track seen items
5seen = set()
6unique_ordered = []
7for item in items:
8 if item not in seen:
9 seen.add(item)
10 unique_ordered.append(item)
11
12print("Original:", items)
13print("Unique (ordered):", unique_ordered)
>>>Output
Original: ['apple', 'banana', 'apple', 'cherry', 'banana', 'date', 'apple']
Unique (ordered): ['apple', 'banana', 'cherry', 'date']

This approach iterates through the original list once. For each item, it checks whether the item has been seen before by checking the set. If not seen, it adds the item to both the seen set (for fast future lookups) and the unique_ordered list (to preserve order). If already seen, it skips the item. The first occurrence of each item is preserved in its original position.

1# Python 3.7+ alternative using dict.fromkeys()
2items = ["apple", "banana", "apple", "cherry", "banana", "date"]
3
4# Dictionary keys are unique and preserve insertion order
5unique_ordered = list(dict.fromkeys(items))
6print("Unique (ordered):", unique_ordered)
>>>Output
Unique (ordered): ['apple', 'banana', 'cherry', 'date']

In Python 3.7 and later, dictionaries preserve insertion order. The dict.fromkeys() method creates a dictionary where each item becomes a key (with None as the value). Since dictionary keys must be unique, duplicates are automatically eliminated while order is preserved. Converting back to a list gives you the deduplicated, ordered result.

When you only need unique values and order does not matter, convert directly to a set. When order must be preserved, use the loop pattern with a set for tracking and a list for collecting results, or use dict.fromkeys() in Python 3.7+.

Adding Elements to Sets

Sets are mutable, meaning you can add elements after creation. Python provides two methods for adding elements: add() for single elements and update() for adding multiple elements at once. Understanding when to use each method helps you write cleaner, more efficient code.

The add() Method

The .add() method inserts exactly one element into the set. If the element already exists, the set remains unchanged and no error is raised. The .add() method modifies the set in place and returns None (not the modified set).

1colors = {"red", "green"}
2print("Initial:", colors)
3
4# Adding a new element
5colors.add("blue")
6print("After adding blue:", colors)
7
8# Adding a duplicate has no effect
9colors.add("red")
10print("After adding red again:", colors)
11
12# Note: add() returns None, not the set
13result = colors.add("yellow")
14print("Return value:", result)
15print("Final set:", colors)
>>>Output
Initial: {'red', 'green'}
After adding blue: {'red', 'green', 'blue'}
After adding red again: {'red', 'green', 'blue'}
Return value: None
Final set: {'red', 'green', 'blue', 'yellow'}
The silent handling of duplicates is a feature, not a bug. It means you can safely add elements without first checking whether they already exist. This simplifies your code and avoids unnecessary conditional statements. The set handles uniqueness for you.

Building Sets with Loops

A common pattern is to start with an empty set and add elements one by one as you process data. This is particularly useful when you need to collect unique values from a stream of inputs or when filtering data based on some condition.
1# Collect unique words from a sentence
2sentence = "the quick brown fox jumps over the lazy dog"
3words = sentence.split()
4
5unique_words = set()
6for word in words:
7 unique_words.add(word)
8
9print("Original sentence:", sentence)
10print("Total words:", len(words))
11print("Unique words:", len(unique_words))
12print("Unique word set:", unique_words)
>>>Output
Original sentence: the quick brown fox jumps over the lazy dog
Total words: 9
Unique words: 8
Unique word set: {'the', 'quick', 'brown', 'fox', 'jumps', 'over', 'lazy', 'dog'}
The sentence contains nine words, but "the" appears twice. The set contains only eight unique words because the second occurrence of "the" was ignored when added. This pattern works regardless of how many times duplicates appear.
1# Collect unique even numbers from a list
2numbers = [4, 7, 2, 9, 4, 6, 2, 8, 7, 6, 4, 3, 8]
3
4unique_evens = set()
5for num in numbers:
6 if num % 2 == 0:
7 unique_evens.add(num)
8
9print("Original numbers:", numbers)
10print("Unique even numbers:", unique_evens)
>>>Output
Original numbers: [4, 7, 2, 9, 4, 6, 2, 8, 7, 6, 4, 3, 8]
Unique even numbers: {2, 4, 6, 8}

The update() Method

The .update() method adds multiple elements from any iterable (list, tuple, string, set, or other iterable). This is more concise and often more efficient than calling add() repeatedly in a loop.

1primary = {"red", "green", "blue"}
2print("Initial:", primary)
3
4# Add multiple elements from a list
5secondary = ["orange", "purple", "green"]
6primary.update(secondary)
7print("After update with list:", primary)
8
9# Add from a tuple
10primary.update(("cyan", "magenta"))
11print("After update with tuple:", primary)
12
13# Add characters from a string
14primary.update("xyz")
15print("After update with string:", primary)
>>>Output
Initial: {'red', 'green', 'blue'}
After update with list: {'red', 'green', 'blue', 'orange', 'purple'}
After update with tuple: {'red', 'green', 'blue', 'orange', 'purple', 'cyan', 'magenta'}
After update with string: {'red', 'green', 'blue', 'orange', 'purple', 'cyan', 'magenta', 'x', 'y', 'z'}
Notice that when updating with the list ["orange", "purple", "green"], only "orange" and "purple" were actually added. "green" was already in the set and was ignored. Also notice that updating with a string adds each character individually, not the entire string as one element.
add()
  • Adds exactly one element
  • set.add("item")
  • Use for single additions
  • Argument must be hashable
update()
  • Adds multiple elements
  • set.update([a, b, c])
  • Use for bulk additions
  • Argument must be iterable
TIP
Use add() for single elements and update() for bulk additions from an iterable. Never pass a list to add() (it raises TypeError) and remember that update() with a string adds each character individually. To store a collection as one element, convert it to a tuple first.
Python Quiz

> Build a set one element at a time. Duplicates are silently ignored. Pick the method that inserts a single element, and the built-in that counts how many unique items remain.

colors = set()
colors.___("red")
colors.add("blue")
colors.add("red")
print(___(colors))
update
type
len
add
append

Sets silently ignore duplicate insertions. Calling add() with a value that already exists is a no-op: the set remains unchanged and no error is raised. This makes sets ideal for collecting unique items in a loop without explicit duplicate checking.

For bulk additions, update() accepts any iterable: a list, tuple, another set, or even a string, which adds each character individually. If you need to add a list as a single element, convert it to a tuple first since lists are unhashable and cannot be stored in a set.

TIP
Do not confuse add() (sets) with append() (lists). Sets have no append() method. If you get an AttributeError: "set" object has no attribute "append", you likely have a set where you expected a list, or vice versa.

Removing Elements from Sets

Daily Life
Interviews

Remove elements and test membership

Python provides several methods for removing elements from sets: remove(), discard(), pop(), and clear(). Each behaves differently and is suited for different situations. Understanding these differences helps you choose the right method and avoid unexpected errors.

remove() vs discard()

Both remove() and discard() delete a specific element from the set. The critical difference is what happens when the element does not exist. The remove() method raises a KeyError exception if the element is not found, while discard() silently does nothing.

1fruits = {"apple", "banana", "cherry", "date"}
2print("Initial:", fruits)
3
4# remove() deletes a specific element
5fruits.remove("banana")
6print("After removing banana:", fruits)
7
8# discard() also deletes a specific element
9fruits.discard("cherry")
10print("After discarding cherry:", fruits)
11
12# discard() on missing element: no error, no change
13fruits.discard("mango")
14print("After discarding mango (not present):", fruits)
>>>Output
Initial: {'apple', 'banana', 'cherry', 'date'}
After removing banana: {'apple', 'cherry', 'date'}
After discarding cherry: {'apple', 'date'}
After discarding mango (not present): {'apple', 'date'}

In this example, discarding "mango" had no effect because mango was not in the set. No error was raised, and the set remained unchanged. If we had used remove("mango") instead, Python would have raised a KeyError exception, potentially crashing our program if we did not handle it.

1fruits = {"apple", "banana"}
2
3# Safe approach: check before removing
4if "mango" in fruits:
5 fruits.remove("mango")
6else:
7 print("mango not found, skipping remove")
8
9# Even simpler: use discard()
10fruits.discard("mango")
11print("After discard:", fruits)
>>>Output
mango not found, skipping remove
After discard: {'apple', 'banana'}
Both approaches handle missing elements gracefully. The if-check approach is explicit, while discard() handles it silently. Choose based on whether you want your code to acknowledge the absence or ignore it entirely.
remove()
  • Raises KeyError if element missing
  • Use when element MUST exist
  • Fails fast on programming bugs
  • Good for required elements
discard()
  • Silent if element is missing
  • Use when element MIGHT exist
  • Safe for uncertain removal
  • Good for optional cleanup
TIP
Use remove() when the element should definitely exist and its absence indicates a bug in your program. Use discard() when it is acceptable for the element to be absent, such as when cleaning up potentially incomplete data.
Try choosing different removal methods below to see how each one behaves when the element is missing from the set.
Fill in the Blank

> A set {"apple", "banana"} does not contain "grape", but you try to remove it anyway. Pick a removal method to see which one handles the missing element gracefully.

fruits = {{"apple", "banana"}}
fruits.("grape")
print(fruits)

The pop() Method

The .pop() method removes and returns an arbitrary element from the set. Because sets are unordered, you cannot predict which element will be removed. This method is useful when you need to process elements one by one and do not care about the order, or when you need to empty a set while examining each element.

1tasks = {"write report", "send email", "attend meeting", "review code"}
2print("Tasks to complete:", tasks)
3
4# Pop removes and returns one element
5while tasks:
6 completed = tasks.pop()
7 print(f"Completed: {completed}")
8 print(f"Remaining: {len(tasks)} tasks")
>>>Output
Tasks to complete: {'write report', 'send email', 'attend meeting', 'review code'}
Completed: write report
Remaining: 3 tasks
Completed: send email
Remaining: 2 tasks
Completed: attend meeting
Remaining: 1 tasks
Completed: review code
Remaining: 0 tasks
The exact order in which elements are popped depends on Python's internal implementation and can vary between different runs or Python versions. Do not assume any particular element will be popped first. If you need a specific order, sort the elements first or use a different data structure.

Calling pop() on an empty set raises a KeyError. Always ensure the set is not empty before popping, either by checking its length or using a while loop as shown above.

Clearing a Set

The .clear() method removes all elements from a set, leaving it empty. This is useful when you want to reset a set for reuse without creating a new set object.

1data = {1, 2, 3, 4, 5}
2print("Before clear:")
3print(" Set:", data)
4print(" Length:", len(data))
5
6data.clear()
7print("After clear:")
8print(" Set:", data)
9print(" Length:", len(data))
10print(" Is empty:", len(data) == 0)
>>>Output
Before clear:
Set: {1, 2, 3, 4, 5}
Length: 5
After clear:
Set: set()
Length: 0
Is empty: True
After clearing, the set still exists as an object but contains no elements. You can continue to add elements to it. Clearing is generally more efficient than creating a new empty set, especially if other variables reference the same set object.

Membership Testing

The in operator checks whether an element exists in a set. This operation is one of the primary reasons to use sets: membership testing in sets is extremely fast, with O(1) time complexity. This makes sets ideal for situations where you need to check existence frequently.

1allowed_users = {"alice", "bob", "charlie", "diana"}
2
3# Check if users are in the set
4print("Is alice allowed?", "alice" in allowed_users)
5print("Is eve allowed?", "eve" in allowed_users)
6
7# The 'not in' operator checks for absence
8print("Is eve NOT allowed?", "eve" not in allowed_users)
>>>Output
Is alice allowed? True
Is eve allowed? False
Is eve NOT allowed? True

The expression "alice" in allowed_users returns True because "alice" is a member of the set. The expression "eve" in allowed_users returns False because "eve" is not in the set. The not in operator returns the logical opposite: True if the element is absent, False if present.

Why Sets Are Fast

Understanding why sets are fast helps you make better decisions about when to use them. When you check if an item is in a list, Python must scan through each element one by one, comparing your search value to each element until it finds a match or reaches the end of the list. This is called linear search. For a list with n items, this requires up to n comparisons in the worst case.
Sets use a fundamentally different approach called hashing. When an element is added to a set, Python calculates a hash value for it, which is a number derived from the element's value. This hash value determines where the element is stored internally. When you check if an element is in the set, Python calculates its hash value and looks directly at that location. This typically requires just one or two comparisons regardless of how many elements are in the set.
1# Practical membership testing example
2inventory = {"laptop", "mouse", "keyboard", "monitor", "webcam", "headset"}
3
4# Check if items are in stock
5items_to_check = ["keyboard", "printer", "mouse", "speaker"]
6
7for item in items_to_check:
8 if item in inventory:
9 status = "IN STOCK"
10 else:
11 status = "OUT OF STOCK"
12 print(f"{item}: {status}")
>>>Output
keyboard: IN STOCK
printer: OUT OF STOCK
mouse: IN STOCK
speaker: OUT OF STOCK
This pattern is fundamental in data validation and access control. Before processing an input, check if it belongs to a set of valid options. The lookup is fast regardless of how many valid options exist. Whether your set of valid options contains ten items or ten million items, each membership test takes approximately the same amount of time.

List-to-Set for Fast Lookup

A common optimization pattern is to convert a list to a set when you need to perform many membership tests against it. The conversion has a one-time cost proportional to the list size, but each subsequent lookup is O(1). If you perform enough lookups, the time saved far exceeds the conversion cost.

1# Suppose we have a list of valid product codes
2valid_codes_list = ["A100", "B200", "C300", "D400", "E500", "F600", "G700"]
3
4# Convert to set for fast lookups
5valid_codes_set = set(valid_codes_list)
6
7# Now validate multiple user inputs efficiently
8user_inputs = ["C300", "X999", "A100", "Z000", "G700"]
9
10print("Validating codes:")
11for code in user_inputs:
12 # This lookup is O(1) because valid_codes_set is a set
13 if code in valid_codes_set:
14 print(f" {code}: VALID")
15 else:
16 print(f" {code}: INVALID")
>>>Output
Validating codes:
C300: VALID
X999: INVALID
A100: VALID
Z000: INVALID
G700: VALID

In real applications, valid_codes_list might contain thousands or millions of entries loaded from a database or configuration file. If you needed to validate millions of user inputs against this list, using a set instead of a list could reduce validation time from hours to seconds.

TIP
If you need to check membership multiple times against the same collection, convert it to a set first. The upfront cost of conversion is quickly recovered through faster lookups. Even just a few dozen lookups can justify the conversion.

The code below has a bug related to membership testing. The developer tried to use the in operator with a list literal instead of a set, losing the O(1) performance advantage. Fix it to use a set.

Debug Challenge

> This code checks membership using a list, which requires scanning every element. Switching to a set gives O(1) lookups instead of O(n).

Functional but slow: list uses O(n) lookup instead of O(1)

Converting a list to a set is one of the most common and impactful performance optimizations in Python. The change is a single word in the source code, but it can reduce the time complexity of membership checks from O(n) to O(1), making code that scanned thousands of items per check effectively instant.

Sets work for membership testing because hashing gives each element a predictable storage address. When you check item in my_set, Python computes the hash of the item and checks one location directly, without scanning any other elements.

TIP
If you need to validate user input against a known list of allowed values, define that list as a set literal from the start: ALLOWED = {"admin", "editor", "viewer"}. This is cleaner and faster than building the set at runtime.

What Can Be in a Set?

Daily Life
Interviews

Identify which types sets accept

Not everything can be an element of a set. Set elements must be hashable, which generally means they must be immutable (unchangeable after creation). This requirement exists because sets use hashing to organize elements internally. If an element could change after being added, the set would not be able to find it anymore because its hash value would be different.
int / floatstrtuplebool/Nonefrozenset
int / float
Numbers
42, 3.14, -17 are valid
str
Strings
"hello" and "" both work
tuple
Tuples
(1, 2) if contents hash
bool/None
Singletons
True, False, None work
frozenset
Frozen Sets
Immutable set variant

Mutable types like lists, dictionaries, and regular sets cannot be set elements because their hash values would change if modified. Python raises a TypeError if you try to add an unhashable type to a set.

1# Valid set elements
2valid_set = {42, "hello", 3.14, True, None, (1, 2, 3)}
3print("Valid set:", valid_set)
4
5# Tuples make excellent set elements for storing pairs
6coordinates = {(0, 0), (1, 0), (0, 1), (1, 1), (0, 0)}
7print("Coordinate set:", coordinates)
8print("Number of unique points:", len(coordinates))
>>>Output
Valid set: {True, 42, 3.14, (1, 2, 3), 'hello', None}
Coordinate set: {(0, 0), (1, 0), (0, 1), (1, 1)}
Number of unique points: 4
Notice that the coordinate set shows only four points even though we specified five. The point (0, 0) was specified twice but only appears once because sets eliminate duplicates. Tuples are hashable (as long as their contents are hashable), making them ideal for storing coordinate pairs, database keys, or any immutable combination of values.
1my_set = set()
2
3# Test which types can be added to a set
4for item in [42, "hello", (1, 2), True]:
5 my_set.add(item)
6 print(f"Added {item!r} - set is now: {my_set}")
7
8# These would raise TypeError:
9try:
10 my_set.add([1, 2, 3])
11except TypeError as e:
12 print(f"Cannot add list: {e}")
>>>Output
Added 42 - set is now: {42}
Added 'hello' - set is now: {42, 'hello'}
Added (1, 2) - set is now: {42, (1, 2), 'hello'}
Added True - set is now: {42, (1, 2), 'hello'}
Cannot add list: unhashable type: 'list'

Notice that True was "added" but the set size did not change. This is because Python considers True and 1 to be equal (and they have the same hash). Since 42 is already in the set and True equals 1 not 42, True just maps to the same slot. Mutable types like lists trigger a TypeError immediately.

01
Lists are mutable
You can change a list after creation, so its content is not fixed
02
Hash would change
A modified list would produce a different hash value than the original
03
Lookup breaks
The set could not find the element at its old hash-based location
04
Python forbids it
Mutable objects are rejected from sets to maintain data integrity
1# If you need to store a list-like collection in a set,
2# convert it to a tuple first
3data_points = [[1, 2], [3, 4], [1, 2], [5, 6]]
4
5# Convert each list to a tuple
6unique_points = {tuple(point) for point in data_points}
7print("Unique points as tuples:", unique_points)
>>>Output
Unique points as tuples: {(1, 2), (3, 4), (5, 6)}

Common Mistakes to Avoid

Even experienced Python programmers sometimes make mistakes when working with sets. Learning about these common pitfalls helps you avoid them and write more robust code.

Mistake 1: {} vs set()

The most common set mistake is trying to create an empty set with empty curly braces {}. Python interprets this as an empty dictionary, not an empty set. This mistake often leads to AttributeError exceptions later when you try to use set methods.

Wrong
  • empty = {}
  • Creates a dictionary!
  • type(empty) returns dict
  • empty.add("x") raises AttributeError
Correct
  • empty = set()
  • Creates a set!
  • type(empty) returns set
  • empty.add("x") works correctly

Mistake 2: Expecting Order

Sets are unordered. Do not write code that assumes elements will appear in any particular order when you iterate over a set or print it. Even if elements seem to appear in a consistent order during testing, this order can change between Python versions, between different runs of your program, or when the set grows or shrinks.
1# Order is NOT guaranteed
2letters = set()
3letters.add("c")
4letters.add("a")
5letters.add("b")
6
7print("Set contents:", letters)
8print("Elements in iteration order:")
9for letter in letters:
10 print(f" {letter}")
11
>>>Output
Set contents: {'a', 'b', 'c'}
Elements in iteration order:
a
b
c
In this example, we added elements in the order c, a, b, but they might appear differently when printed. If you need elements in a specific order, sort them explicitly or use a list instead.

Mistake 3: Indexing Sets

You cannot access set elements by index. Sets have no concept of "first element" or "element at position 2" because they have no order. Trying to index a set with square brackets raises a TypeError.

1colors = {"red", "green", "blue"}
2
3# Trying to index a set raises TypeError
4try:
5 print(colors[0])
6except TypeError as e:
7 print(f"Error: {e}")
8
9# Sort to list for indexing
10colors_list = sorted(colors)
11print("Sorted list:", colors_list)
12print("First alphabetically:", colors_list[0])
>>>Output
Error: 'set' object is not subscriptable
Sorted list: ['blue', 'green', 'red']
First alphabetically: blue

Using sorted() gives you a predictable ordering every time, unlike converting to an unsorted list where the order could vary. If you need indexed access often, store your data in a list instead of a set.

Mistake 4: In-Loop Mutation

Adding or removing elements from a set while iterating over it can cause unexpected behavior or RuntimeError exceptions. If you need to modify a set based on its contents, iterate over a copy instead.

1numbers = {1, 2, 3, 4, 5, 6}
2print("Before:", numbers)
3
4# CORRECT: Iterate over a copy
5for n in numbers.copy():
6 if n % 2 == 0:
7 numbers.remove(n)
8
9print("Odd numbers only:", numbers)
10
11# ALTERNATIVE: Set comprehension
12numbers2 = {1, 2, 3, 4, 5, 6}
13odds = {n for n in numbers2 if n % 2 != 0}
14print("Using comprehension:", odds)
>>>Output
Before: {1, 2, 3, 4, 5, 6}
Odd numbers only: {1, 3, 5}
Using comprehension: {1, 3, 5}
If you need to access elements by position, use a list. If you need uniqueness and fast membership testing, use a set. Sometimes you need both: maintain a list for ordered access and a set for fast lookups.
Try fixing the buggy code below. The programmer accidentally used curly braces to create what they thought was an empty set.
Debug Challenge

> This code uses {} to create what it thinks is an empty set, but Python interprets {} as an empty dictionary. The .add() call then fails.

AttributeError: 'dict' object has no attribute 'add'

Practical Examples

Let us look at some practical examples that demonstrate how sets solve real problems more elegantly than other approaches.
When to Reach for a Set
  • You need to count unique items from a collection with duplicates
  • You are validating input against a list of allowed values
  • You want to find duplicates by comparing list and set lengths
  • You are tracking which items you have already processed
  • You need fast membership testing across many repeated lookups

Example 1: Duplicate Values

Detecting whether a list contains duplicates is straightforward with sets: if the set length is less than the list length, duplicates exist.
1def has_duplicates(items):
2 """Return True if items contains any duplicates."""
3 return len(items) != len(set(items))
4
5# Test with various lists
6list1 = [1, 2, 3, 4, 5]
7list2 = [1, 2, 3, 2, 5]
8list3 = ["apple", "banana", "apple"]
9
10print(f"List {list1} has duplicates: {has_duplicates(list1)}")
11print(f"List {list2} has duplicates: {has_duplicates(list2)}")
12print(f"List {list3} has duplicates: {has_duplicates(list3)}")
>>>Output
List [1, 2, 3, 4, 5] has duplicates: False
List [1, 2, 3, 2, 5] has duplicates: True
List ['apple', 'banana', 'apple'] has duplicates: True

Example 2: Validating Input

Sets are perfect for validating that user input belongs to a set of allowed values.
1# Define valid options as a set
2VALID_SIZES = {"small", "medium", "large", "xl"}
3VALID_COLORS = {"red", "blue", "green", "black", "white"}
4
5def validate_order(size, color):
6 """Validate that size and color are valid options."""
7 errors = []
8
9 if size.lower() not in VALID_SIZES:
10 errors.append(f"Invalid size: {size}")
11
12 if color.lower() not in VALID_COLORS:
13 errors.append(f"Invalid color: {color}")
14
15 return errors
16
17# Test validation
18print(validate_order("Medium", "Blue"))
19print(validate_order("XL", "Purple"))
20print(validate_order("Huge", "Orange"))
>>>Output
[]
["Invalid color: Purple"]
["Invalid size: Huge", "Invalid color: Orange"]

Example 3: Tracking Items

When processing a stream of data, use a set to efficiently track which items you have already seen.
1# Process log entries and track unique errors
2log_entries = [
3 "INFO: User logged in",
4 "ERROR: Database connection failed",
5 "INFO: Data loaded",
6 "ERROR: File not found",
7 "ERROR: Database connection failed",
8 "WARNING: Memory usage high",
9 "ERROR: Database connection failed",
10]
11
12# Track unique errors
13unique_errors = set()
14for entry in log_entries:
15 if entry.startswith("ERROR:"):
16 unique_errors.add(entry)
17
18print(f"Total log entries: {len(log_entries)}")
19print(f"Unique error types: {len(unique_errors)}")
20print("Unique errors:")
21for error in unique_errors:
22 print(f" {error}")
>>>Output
Total log entries: 7
Unique error types: 2
Unique errors:
ERROR: Database connection failed
ERROR: File not found
Sets provide a powerful way to work with unique collections and perform membership tests efficiently. Put these fundamentals to the test with hands-on challenges in the Python Builder.
PUTTING IT ALL TOGETHER

> You are a data analyst at Mailchimp deduplicating email addresses collected from three separate campaign upload files before running a bulk re-engagement send, ensuring no subscriber receives the same message twice and that every address meets basic hashability requirements.

set() created from the first campaign list automatically removes any duplicate addresses already present in that single source file.
.add() merges each address from the second and third campaign files into the existing set without any risk of introducing duplicates.
The in operator checks whether a specific email was already captured before deciding whether to include it from a new source list.
The set's uniqueness guarantee means the final address list passed to Mailchimp's send API contains no repeated recipient emails.
KEY TAKEAWAYS
Sets are unordered collections that automatically eliminate duplicates
Create sets with curly braces {1, 2, 3} or use set() for empty sets
Empty curly braces {} creates a dictionary, not a set
Convert lists to sets to remove duplicates: set(my_list)
.add() adds one element; .update() adds multiple from an iterable
.remove() raises KeyError if missing; .discard() is silent
Membership testing with in is O(1) - extremely fast regardless of set size
Set elements must be hashable (immutable): strings, numbers, tuples
Lists, dictionaries, and sets cannot be elements of sets
Do not modify a set while iterating over it; iterate over a copy instead

Collections that guarantee uniqueness

Category
Python
Difficulty
beginner
Duration
55 minutes
Challenges
3 hands-on challenges

Topics covered: What is a Set?, Creating Sets, Automatic Duplicate Removal, Removing Elements from Sets, What Can Be in a Set?

Lesson Sections

  1. What is a Set? (concepts: pySets)

    A set is an unordered collection of unique elements. These two properties define what makes a set different from other collection types like lists and tuples. Understanding both properties is essential for using sets correctly. The word "unordered" means that sets do not maintain any particular sequence for their elements. Unlike lists, where the first item you add stays first and the last item stays last, sets make no guarantees about element order. When you iterate over a set or print it, the

  2. Creating Sets

    Python provides two main ways to create sets. You can use curly braces with elements inside, similar to how you write dictionary literals but without key-value pairs. Alternatively, you can use the set() constructor function, which can convert other iterables into sets. Each approach has specific use cases and limitations that you should understand. Using Curly Braces The most common and concise way to create a set with initial elements is using curly braces. Place your elements inside the brace

  3. Automatic Duplicate Removal

    The automatic duplicate removal behavior of sets is one of their most powerful and useful features. Sets eliminate duplicates both during creation and when adding new elements. This happens silently, without errors or warnings. Understanding this behavior allows you to write cleaner, more concise code. Even though we specified "Alice" three times and "Bob" twice in the set literal, the resulting set contains each name exactly once. Python processes the elements in order, adding each one to the s

  4. Removing Elements from Sets

    remove() vs discard() Both approaches handle missing elements gracefully. The if-check approach is explicit, while discard() handles it silently. Choose based on whether you want your code to acknowledge the absence or ignore it entirely. Try choosing different removal methods below to see how each one behaves when the element is missing from the set. The pop() Method The exact order in which elements are popped depends on Python's internal implementation and can vary between different runs or P

  5. What Can Be in a Set?

    Not everything can be an element of a set. Set elements must be hashable, which generally means they must be immutable (unchangeable after creation). This requirement exists because sets use hashing to organize elements internally. If an element could change after being added, the set would not be able to find it anymore because its hash value would be different. Notice that the coordinate set shows only four points even though we specified five. The point (0, 0) was specified twice but only app