Sets: Beginner

Slack computes shared channel membership between any two users in milliseconds by taking the intersection of their respective channel sets, a calculation that runs in constant time regardless of how many channels either person belongs to. With hundreds of millions of messages processed daily across millions of workspaces, doing this with list lookups would be impossibly slow, requiring linear scans for every comparison. Sets make the operation instant because membership testing is O(1), not O(n). The add, remove, and membership operations you will learn in this lesson are the same building blocks Slack and every major collaboration platform rely on to serve membership data at scale.

What is a Set?

Daily Life

Interviews

Distinguish sets from lists

A set is an unordered collection of unique elements. These two properties define what makes a set different from other collection types like lists and tuples. Understanding both properties is essential for using sets correctly.

The word "unordered" means that sets do not maintain any particular sequence for their elements. Unlike lists, where the first item you add stays first and the last item stays last, sets make no guarantees about element order. When you iterate over a set or print it, the elements might appear in any order. This order might even change between different runs of your program. You cannot rely on sets to preserve the order in which you added elements.

The word "unique" means that each element can appear at most once in a set. If you try to add a duplicate element, the set simply ignores it without raising an error. The set remains unchanged. This automatic duplicate handling is one of the most useful features of sets. You never need to check whether an element exists before adding it. You can add freely, and the set ensures uniqueness.

Unordered

Elements have no defined position or index in a set

Unique

Each element can appear at most once in the collection

Mutable

You can add and remove elements after creation

Fast lookup

Checking membership is extremely efficient at O(1) time

Dynamic size

Sets grow and shrink as you add and remove elements

Think of a set like a bag of marbles where each marble must be a different color. If you already have a red marble in the bag and try to add another red marble, the bag rejects it because red is already represented. You cannot ask for "the third marble" or "the marble at position five" because marbles in a bag have no order. But you can very quickly check "Is there a blue marble in the bag?" by reaching in and finding it almost instantly.

This analogy also illustrates why sets are useful. If you wanted to know how many different colors of marbles you have, a set gives you the answer directly: its length equals the number of unique colors. With a list, you would need to examine each marble and keep track of which colors you have already seen.

Sets vs Lists Fundamentals

Lists and sets are both collection types, but they serve fundamentally different purposes. Understanding when to use each is crucial for writing efficient, correct code. The wrong choice can lead to subtle bugs or severe performance problems.

Lists preserve order and allow duplicates. When you add items to a list, they stay in the order you added them. You can access items by their position using indexing: list[0] gives you the first item, list[1] gives you the second, and so on. Lists allow the same value to appear multiple times. A list like [1, 1, 2, 2, 3] is perfectly valid and maintains all five elements.

Sets ignore order and enforce uniqueness. When you add items to a set, they do not have positions. You cannot access set elements by index. Sets reject duplicate values automatically. A set created from {1, 1, 2, 2, 3} would contain only {1, 2, 3} because duplicates are eliminated.

•List

Ordered: [1, 2, 3] stays in that order
Allows duplicates: [1, 1, 2, 2] is valid
Access by index: list[0] returns first item
Slower membership test: O(n) time complexity
Preserves insertion order

•Set

Unordered: order is not guaranteed
No duplicates: {1, 2} only, no repeats
No indexing: set[0] raises an error
Fast membership test: O(1) time complexity
No concept of order

The notation O(n) means that checking if an item is in a list takes time proportional to the list size. If the list has n items, you might need to check all n items in the worst case. A list with a million items requires up to a million comparisons. The notation O(1) means that checking if an item is in a set takes constant time regardless of set size. Whether the set has ten items or ten million items, the lookup takes approximately the same amount of time.

This performance difference matters enormously in practice. If you need to check whether items exist in a collection thousands or millions of times, using a set instead of a list can reduce your runtime from hours to seconds. Many experienced programmers have debugged slow code only to discover that converting a list to a set solved the performance problem.

Python Quiz

> A set automatically removes duplicates. Pick the built-in that counts unique elements, and the keyword that tests membership in constant time.

data = {3, 1, 4, 1, 5, 9, 2, 6, 5}
print(___(data))
print(5 ___ data)

len

sum

max

Sets and lists are both collections, but they solve different problems. Use a list when you care about order or need to store duplicates. Use a set when you only care whether an item is present and want instant answers regardless of collection size.

The O(1) membership test is the defining advantage of sets. It comes from hashing: Python converts each element into a number that directly points to its storage location, so no scanning is needed. This makes sets the right tool for membership checks in any performance-sensitive code.

TIP

When you find yourself writing if item in my_list inside a loop, consider converting my_list to a set first. The lookup cost drops from O(n) to O(1), turning potentially slow code into fast code with a one-word change.

Creating Sets

Daily Life

Interviews

Build sets from any iterable

Python provides two main ways to create sets. You can use curly braces with elements inside, similar to how you write dictionary literals but without key-value pairs. Alternatively, you can use the set() constructor function, which can convert other iterables into sets. Each approach has specific use cases and limitations that you should understand.

Using Curly Braces

The most common and concise way to create a set with initial elements is using curly braces. Place your elements inside the braces, separated by commas. This syntax looks similar to dictionary syntax, but dictionaries have key-value pairs separated by colons, while sets contain only single values.

1	# Create a set of colors
2	colors = {"red", "green", "blue"}
3	print(colors)
4	print(type(colors))
5
6	# Create a set of numbers
7	numbers = {1, 2, 3, 4, 5}
8	print(numbers)
9
10	# Create a set with mixed types
11	mixed = {42, "hello", 3.14, True}
12	print(mixed)

>>>Output

{'red', 'green', 'blue'}

<class 'set'>

{1, 2, 3, 4, 5}

{True, 42, 3.14, 'hello'}

When you print a set, Python displays it with curly braces. Notice that the order of elements in the output may differ from the order you wrote them. In the mixed set example, the elements appear in a different order than we specified. This is completely normal behavior because sets are unordered. Do not write code that depends on any particular ordering of set elements.

The type() function confirms that these objects are sets. Python's set type is a built-in type, meaning it is always available without importing anything. Sets are as fundamental to Python as lists, dictionaries, and tuples.

The Empty Set Problem

There is one critical exception to the curly brace syntax that trips up many Python programmers. You cannot create an empty set with empty curly braces. When Python sees {}, it interprets this as an empty dictionary, not an empty set. This behavior exists for historical reasons: dictionaries were added to Python before sets, and {} was already established as the dictionary literal syntax.

1	# Creates a DICTIONARY, not a set!
2	not_a_set = {}
3	print("Type of {}:", type(not_a_set))
4
5	# This is how you create an empty SET
6	empty_set = set()
7	print("Type of set():", type(empty_set))
8
9	empty_set.add("first element")
10	print("After adding:", empty_set)

>>>Output

Type of {}: <class 'dict'>

Type of set(): <class 'set'>

After adding: {'first element'}

This quirk catches even experienced Python developers. If you write code that initializes a variable with {} and later tries to use set methods like add(), you will get an AttributeError because dictionaries do not have an add() method. The error message might be confusing because you thought you had a set.

TIP

Always use set() to create an empty set. Using {} creates an empty dictionary. This is one of the most common Python gotchas and has caused countless debugging sessions.

Sets from Other Collections

The set() constructor is versatile. It can convert any iterable object into a set. An iterable is anything you can loop over: lists, tuples, strings, ranges, and even other sets. This conversion automatically removes any duplicate values, which is often exactly what you want.

1	# Convert a list with duplicates to a set
2	numbers_list = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
3	unique_numbers = set(numbers_list)
4	print("Original list:", numbers_list)
5	print("As a set:", unique_numbers)
6	print("List length:", len(numbers_list))
7	print("Set length:", len(unique_numbers))

>>>Output

Original list: [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]

As a set: {1, 2, 3, 4}

List length: 10

Set length: 4

In this example, the list has ten elements but only four unique values. The set constructor processes each element from the list. When it encounters an element it has already seen, it simply skips it. The resulting set has length four because there are only four distinct values in the original list.

1	# Convert a string to a set of characters
2	word = "mississippi"
3	letters = set(word)
4	print("Word:", word)
5	print("Unique letters:", letters)
6	print("Total characters:", len(word))
7	print("Unique characters:", len(letters))

>>>Output

Word: mississippi

Unique letters: {'m', 'i', 's', 'p'}

Total characters: 11

Unique characters: 4

Strings are iterable in Python, meaning you can loop over their characters. When you pass a string to set(), Python treats it as a sequence of characters and creates a set containing each unique character. The string "mississippi" has eleven characters total, but only four unique letters: m, i, s, and p. The set contains exactly these four characters.

This pattern of converting a collection to a set for deduplication is extremely common in data processing:

1	# Real-world example: counting unique visitors
2	visitor_log = [101, 102, 101, 103, 102, 101, 104, 103, 101, 105]
3	unique_visitors = set(visitor_log)
4
5	print("Total page views:", len(visitor_log))
6	print("Unique visitors:", len(unique_visitors))
7	print("Visitor IDs:", unique_visitors)

>>>Output

Total page views: 10

Unique visitors: 5

Visitor IDs: {101, 102, 103, 104, 105}

1	# Converting tuples and ranges
2	tuple_data = (1, 2, 2, 3)
3	set_from_tuple = set(tuple_data)
4	print("From tuple:", set_from_tuple)
5
6	range_data = range(1, 10, 2)
7	set_from_range = set(range_data)
8	print("From range:", set_from_range)

>>>Output

From tuple: {1, 2, 3}

From range: {1, 3, 5, 7, 9}

Try choosing different constructors below to see how Python interprets each syntax for creating a set versus other collection types.

Fill in the Blank

> You have a list [3, 1, 2, 1] with a duplicate value. Pick a constructor to convert it and see how set, list, and tuple each handle duplicates and ordering differently.

result = ([3, 1, 2, 1])
print(type(result))
print(result)

The constructor you choose determines everything about the resulting collection: whether it keeps duplicates, whether it maintains order, and whether it supports fast membership checks. set() is unique in that it both removes duplicates and provides O(1) lookups.

A common pattern is to convert a list to a set and back: list(set(my_list)). This deduplicates a list in one step, though the output order may differ from the input since sets do not guarantee ordering.

TIP

Use set() with an empty argument to create an empty set, not {}. Curly braces alone create an empty dictionary in Python. Always write my_set = set() for an empty set.

Automatic Duplicate Removal

Daily Life

Interviews

Deduplicate data with add and update

The automatic duplicate removal behavior of sets is one of their most powerful and useful features. Sets eliminate duplicates both during creation and when adding new elements. This happens silently, without errors or warnings. Understanding this behavior allows you to write cleaner, more concise code.

1	# Duplicates are automatically removed during creation
2	votes = {"Alice", "Bob", "Alice", "Charlie", "Bob", "Alice"}
3	print("Votes cast:", votes)
4	print("Unique voters:", len(votes))
5
6	# Even though we wrote Alice 3 times and Bob 2 times
7	# The set contains each name exactly once

>>>Output

Votes cast: {'Alice', 'Bob', 'Charlie'}

Unique voters: 3

Even though we specified "Alice" three times and "Bob" twice in the set literal, the resulting set contains each name exactly once. Python processes the elements in order, adding each one to the set. When it encounters an element that already exists in the set, it simply skips it. This behavior is consistent and predictable.

This same deduplication happens when you add elements to an existing set. If you add an element that already exists, the set remains unchanged. No error is raised, and the size of the set does not increase. This allows you to add elements freely without first checking whether they exist.

1	# Duplicates are also ignored when using add()
2	colors = {"red", "blue"}
3	print("Initial set:", colors)
4	print("Initial size:", len(colors))
5
6	colors.add("green")
7	print("After adding green:", colors)
8	print("Size:", len(colors))
9
10	colors.add("red")
11	print("After adding red again:", colors)
12	print("Size:", len(colors))

>>>Output

Initial set: {'red', 'blue'}

Initial size: 2

After adding green: {'red', 'blue', 'green'}

Size: 3

After adding red again: {'red', 'blue', 'green'}

Size: 3

Counting Unique Values

One of the most common uses of sets is counting unique values in a dataset. Given any collection with potential duplicates, converting to a set and checking its length tells you how many distinct values exist. This operation is fast and memory-efficient.

1	# Sample log of page views on a website
2	page_views = [
3	"home", "about", "home", "contact", "home",
4	"products", "about", "home", "pricing", "about"
5	]
6
7	# How many unique pages were viewed?
8	unique_pages = set(page_views)
9	print("All page views:", page_views)
10	print("Unique pages:", unique_pages)
11	print("Total views:", len(page_views))
12	print("Unique page count:", len(unique_pages))

>>>Output

All page views: ['home', 'about', 'home', 'contact', 'home', 'products', 'about', 'home', 'pricing', 'about']

Unique pages: {'home', 'about', 'contact', 'products', 'pricing'}

Total views: 10

Unique page count: 5

This pattern appears constantly in data analysis and processing. Given a column of data from a database or spreadsheet, you often need to know "How many distinct values are there?" Converting to a set and checking its length answers this question efficiently, regardless of how large the original dataset is.

1	# Another example: analyzing survey responses
2	responses = ["yes", "no", "yes", "maybe", "yes", "no", "yes", "maybe", "no"]
3
4	unique_responses = set(responses)
5	print("Possible responses:", unique_responses)
6	print("Number of unique responses:", len(unique_responses))
7
8	# You can also check the ratio of unique to total
9	uniqueness_ratio = len(unique_responses) / len(responses)
10	print(f"Uniqueness ratio: {uniqueness_ratio:.1%}")

>>>Output

Possible responses: {'yes', 'no', 'maybe'}

Number of unique responses: 3

Uniqueness ratio: 33.3%

Ordered Duplicate Removal

Because sets are unordered, converting a list to a set and back to a list loses the original order. Sometimes you need to remove duplicates while keeping the first occurrence of each item in its original position. This requires a different approach that uses a set for tracking but preserves order in a separate list.

1	# Remove duplicates while preserving order
2	items = ["apple", "banana", "apple", "cherry", "banana", "date", "apple"]
3
4	# Method 1: Using a set to track seen items
5	seen = set()
6	unique_ordered = []
7	for item in items:
8	if item not in seen:
9	seen.add(item)
10	unique_ordered.append(item)
11
12	print("Original:", items)
13	print("Unique (ordered):", unique_ordered)

>>>Output

Original: ['apple', 'banana', 'apple', 'cherry', 'banana', 'date', 'apple']

Unique (ordered): ['apple', 'banana', 'cherry', 'date']

This approach iterates through the original list once. For each item, it checks whether the item has been seen before by checking the set. If not seen, it adds the item to both the seen set (for fast future lookups) and the unique_ordered list (to preserve order). If already seen, it skips the item. The first occurrence of each item is preserved in its original position.

1	# Python 3.7+ alternative using dict.fromkeys()
2	items = ["apple", "banana", "apple", "cherry", "banana", "date"]
3
4	# Dictionary keys are unique and preserve insertion order
5	unique_ordered = list(dict.fromkeys(items))
6	print("Unique (ordered):", unique_ordered)

>>>Output

Unique (ordered): ['apple', 'banana', 'cherry', 'date']

In Python 3.7 and later, dictionaries preserve insertion order. The dict.fromkeys() method creates a dictionary where each item becomes a key (with None as the value). Since dictionary keys must be unique, duplicates are automatically eliminated while order is preserved. Converting back to a list gives you the deduplicated, ordered result.

When you only need unique values and order does not matter, convert directly to a set. When order must be preserved, use the loop pattern with a set for tracking and a list for collecting results, or use dict.fromkeys() in Python 3.7+.

Adding Elements to Sets

Sets are mutable, meaning you can add elements after creation. Python provides two methods for adding elements: add() for single elements and update() for adding multiple elements at once. Understanding when to use each method helps you write cleaner, more efficient code.

The add() Method

The .add() method inserts exactly one element into the set. If the element already exists, the set remains unchanged and no error is raised. The .add() method modifies the set in place and returns None (not the modified set).

1	colors = {"red", "green"}
2	print("Initial:", colors)
3
4	# Adding a new element
5	colors.add("blue")
6	print("After adding blue:", colors)
7
8	# Adding a duplicate has no effect
9	colors.add("red")
10	print("After adding red again:", colors)
11
12	# Note: add() returns None, not the set
13	result = colors.add("yellow")
14	print("Return value:", result)
15	print("Final set:", colors)

>>>Output

Initial: {'red', 'green'}

After adding blue: {'red', 'green', 'blue'}

After adding red again: {'red', 'green', 'blue'}

Return value: None

Final set: {'red', 'green', 'blue', 'yellow'}

The silent handling of duplicates is a feature, not a bug. It means you can safely add elements without first checking whether they already exist. This simplifies your code and avoids unnecessary conditional statements. The set handles uniqueness for you.

Building Sets with Loops

A common pattern is to start with an empty set and add elements one by one as you process data. This is particularly useful when you need to collect unique values from a stream of inputs or when filtering data based on some condition.

1	# Collect unique words from a sentence
2	sentence = "the quick brown fox jumps over the lazy dog"
3	words = sentence.split()
4
5	unique_words = set()
6	for word in words:
7	unique_words.add(word)
8
9	print("Original sentence:", sentence)
10	print("Total words:", len(words))
11	print("Unique words:", len(unique_words))
12	print("Unique word set:", unique_words)

>>>Output

Original sentence: the quick brown fox jumps over the lazy dog

Total words: 9

Unique words: 8

Unique word set: {'the', 'quick', 'brown', 'fox', 'jumps', 'over', 'lazy', 'dog'}

The sentence contains nine words, but "the" appears twice. The set contains only eight unique words because the second occurrence of "the" was ignored when added. This pattern works regardless of how many times duplicates appear.

1	# Collect unique even numbers from a list
2	numbers = [4, 7, 2, 9, 4, 6, 2, 8, 7, 6, 4, 3, 8]
3
4	unique_evens = set()
5	for num in numbers:
6	if num % 2 == 0:
7	unique_evens.add(num)
8
9	print("Original numbers:", numbers)
10	print("Unique even numbers:", unique_evens)

>>>Output

Original numbers: [4, 7, 2, 9, 4, 6, 2, 8, 7, 6, 4, 3, 8]

Unique even numbers: {2, 4, 6, 8}

The update() Method

The .update() method adds multiple elements from any iterable (list, tuple, string, set, or other iterable). This is more concise and often more efficient than calling add() repeatedly in a loop.

1	primary = {"red", "green", "blue"}
2	print("Initial:", primary)
3
4	# Add multiple elements from a list
5	secondary = ["orange", "purple", "green"]
6	primary.update(secondary)
7	print("After update with list:", primary)
8
9	# Add from a tuple
10	primary.update(("cyan", "magenta"))
11	print("After update with tuple:", primary)
12
13	# Add characters from a string
14	primary.update("xyz")
15	print("After update with string:", primary)

>>>Output

Initial: {'red', 'green', 'blue'}

After update with list: {'red', 'green', 'blue', 'orange', 'purple'}

After update with tuple: {'red', 'green', 'blue', 'orange', 'purple', 'cyan', 'magenta'}

After update with string: {'red', 'green', 'blue', 'orange', 'purple', 'cyan', 'magenta', 'x', 'y', 'z'}

Notice that when updating with the list ["orange", "purple", "green"], only "orange" and "purple" were actually added. "green" was already in the set and was ignored. Also notice that updating with a string adds each character individually, not the entire string as one element.

•add()

Adds exactly one element
set.add("item")
Use for single additions
Argument must be hashable

•update()

Adds multiple elements
set.update([a, b, c])
Use for bulk additions
Argument must be iterable

TIP

Use add() for single elements and update() for bulk additions from an iterable. Never pass a list to add() (it raises TypeError) and remember that update() with a string adds each character individually. To store a collection as one element, convert it to a tuple first.

Python Quiz

> Build a set one element at a time. Duplicates are silently ignored. Pick the method that inserts a single element, and the built-in that counts how many unique items remain.

colors = set()
colors.___("red")
colors.add("blue")
colors.add("red")
print(___(colors))

update

type

len

add

append

Sets silently ignore duplicate insertions. Calling add() with a value that already exists is a no-op: the set remains unchanged and no error is raised. This makes sets ideal for collecting unique items in a loop without explicit duplicate checking.

For bulk additions, update() accepts any iterable: a list, tuple, another set, or even a string, which adds each character individually. If you need to add a list as a single element, convert it to a tuple first since lists are unhashable and cannot be stored in a set.

TIP

Do not confuse add() (sets) with append() (lists). Sets have no append() method. If you get an AttributeError: "set" object has no attribute "append", you likely have a set where you expected a list, or vice versa.

Removing Elements from Sets

Daily Life

Interviews

Remove elements and test membership

Python provides several methods for removing elements from sets: remove(), discard(), pop(), and clear(). Each behaves differently and is suited for different situations. Understanding these differences helps you choose the right method and avoid unexpected errors.

remove() vs discard()

Both remove() and discard() delete a specific element from the set. The critical difference is what happens when the element does not exist. The remove() method raises a KeyError exception if the element is not found, while discard() silently does nothing.

1	fruits = {"apple", "banana", "cherry", "date"}
2	print("Initial:", fruits)
3
4	# remove() deletes a specific element
5	fruits.remove("banana")
6	print("After removing banana:", fruits)
7
8	# discard() also deletes a specific element
9	fruits.discard("cherry")
10	print("After discarding cherry:", fruits)
11
12	# discard() on missing element: no error, no change
13	fruits.discard("mango")
14	print("After discarding mango (not present):", fruits)

>>>Output

Initial: {'apple', 'banana', 'cherry', 'date'}

After removing banana: {'apple', 'cherry', 'date'}

After discarding cherry: {'apple', 'date'}

After discarding mango (not present): {'apple', 'date'}

In this example, discarding "mango" had no effect because mango was not in the set. No error was raised, and the set remained unchanged. If we had used remove("mango") instead, Python would have raised a KeyError exception, potentially crashing our program if we did not handle it.

1	fruits = {"apple", "banana"}
2
3	# Safe approach: check before removing
4	if "mango" in fruits:
5	fruits.remove("mango")
6	else:
7	print("mango not found, skipping remove")
8
9	# Even simpler: use discard()
10	fruits.discard("mango")
11	print("After discard:", fruits)

>>>Output

mango not found, skipping remove

After discard: {'apple', 'banana'}

Both approaches handle missing elements gracefully. The if-check approach is explicit, while discard() handles it silently. Choose based on whether you want your code to acknowledge the absence or ignore it entirely.

•remove()

Raises KeyError if element missing
Use when element MUST exist
Fails fast on programming bugs
Good for required elements

•discard()

Silent if element is missing
Use when element MIGHT exist
Safe for uncertain removal
Good for optional cleanup

TIP

Use remove() when the element should definitely exist and its absence indicates a bug in your program. Use discard() when it is acceptable for the element to be absent, such as when cleaning up potentially incomplete data.

Try choosing different removal methods below to see how each one behaves when the element is missing from the set.

Fill in the Blank

> A set {"apple", "banana"} does not contain "grape", but you try to remove it anyway. Pick a removal method to see which one handles the missing element gracefully.

fruits = {{"apple", "banana"}}
fruits.("grape")
print(fruits)

The pop() Method

The .pop() method removes and returns an arbitrary element from the set. Because sets are unordered, you cannot predict which element will be removed. This method is useful when you need to process elements one by one and do not care about the order, or when you need to empty a set while examining each element.

1	tasks = {"write report", "send email", "attend meeting", "review code"}
2	print("Tasks to complete:", tasks)
3
4	# Pop removes and returns one element
5	while tasks:
6	completed = tasks.pop()
7	print(f"Completed: {completed}")
8	print(f"Remaining: {len(tasks)} tasks")

>>>Output

Tasks to complete: {'write report', 'send email', 'attend meeting', 'review code'}

Completed: write report

Remaining: 3 tasks

Completed: send email

Remaining: 2 tasks

Completed: attend meeting

Remaining: 1 tasks

Completed: review code

Remaining: 0 tasks

The exact order in which elements are popped depends on Python's internal implementation and can vary between different runs or Python versions. Do not assume any particular element will be popped first. If you need a specific order, sort the elements first or use a different data structure.

Calling pop() on an empty set raises a KeyError. Always ensure the set is not empty before popping, either by checking its length or using a while loop as shown above.

Clearing a Set

The .clear() method removes all elements from a set, leaving it empty. This is useful when you want to reset a set for reuse without creating a new set object.

1	data = {1, 2, 3, 4, 5}
2	print("Before clear:")
3	print(" Set:", data)
4	print(" Length:", len(data))
5
6	data.clear()
7	print("After clear:")
8	print(" Set:", data)
9	print(" Length:", len(data))
10	print(" Is empty:", len(data) == 0)

>>>Output

Before clear:

  Set: {1, 2, 3, 4, 5}

  Length: 5

After clear:

  Set: set()

  Length: 0

  Is empty: True

After clearing, the set still exists as an object but contains no elements. You can continue to add elements to it. Clearing is generally more efficient than creating a new empty set, especially if other variables reference the same set object.

Membership Testing

The in operator checks whether an element exists in a set. This operation is one of the primary reasons to use sets: membership testing in sets is extremely fast, with O(1) time complexity. This makes sets ideal for situations where you need to check existence frequently.

1	allowed_users = {"alice", "bob", "charlie", "diana"}
2
3	# Check if users are in the set
4	print("Is alice allowed?", "alice" in allowed_users)
5	print("Is eve allowed?", "eve" in allowed_users)
6
7	# The 'not in' operator checks for absence
8	print("Is eve NOT allowed?", "eve" not in allowed_users)

>>>Output

Is alice allowed? True

Is eve allowed? False

Is eve NOT allowed? True

The expression "alice" in allowed_users returns True because "alice" is a member of the set. The expression "eve" in allowed_users returns False because "eve" is not in the set. The not in operator returns the logical opposite: True if the element is absent, False if present.

Why Sets Are Fast

Understanding why sets are fast helps you make better decisions about when to use them. When you check if an item is in a list, Python must scan through each element one by one, comparing your search value to each element until it finds a match or reaches the end of the list. This is called linear search. For a list with n items, this requires up to n comparisons in the worst case.

Sets use a fundamentally different approach called hashing. When an element is added to a set, Python calculates a hash value for it, which is a number derived from the element's value. This hash value determines where the element is stored internally. When you check if an element is in the set, Python calculates its hash value and looks directly at that location. This typically requires just one or two comparisons regardless of how many elements are in the set.

1	# Practical membership testing example
2	inventory = {"laptop", "mouse", "keyboard", "monitor", "webcam", "headset"}
3
4	# Check if items are in stock
5	items_to_check = ["keyboard", "printer", "mouse", "speaker"]
6
7	for item in items_to_check:
8	if item in inventory:
9	status = "IN STOCK"
10	else:
11	status = "OUT OF STOCK"
12	print(f"{item}: {status}")

>>>Output

keyboard: IN STOCK

printer: OUT OF STOCK

mouse: IN STOCK

speaker: OUT OF STOCK

This pattern is fundamental in data validation and access control. Before processing an input, check if it belongs to a set of valid options. The lookup is fast regardless of how many valid options exist. Whether your set of valid options contains ten items or ten million items, each membership test takes approximately the same amount of time.

List-to-Set for Fast Lookup

A common optimization pattern is to convert a list to a set when you need to perform many membership tests against it. The conversion has a one-time cost proportional to the list size, but each subsequent lookup is O(1). If you perform enough lookups, the time saved far exceeds the conversion cost.

1	# Suppose we have a list of valid product codes
2	valid_codes_list = ["A100", "B200", "C300", "D400", "E500", "F600", "G700"]
3
4	# Convert to set for fast lookups
5	valid_codes_set = set(valid_codes_list)
6
7	# Now validate multiple user inputs efficiently
8	user_inputs = ["C300", "X999", "A100", "Z000", "G700"]
9
10	print("Validating codes:")
11	for code in user_inputs:
12	# This lookup is O(1) because valid_codes_set is a set
13	if code in valid_codes_set:
14	print(f" {code}: VALID")
15	else:
16	print(f" {code}: INVALID")

>>>Output

Validating codes:

  C300: VALID

  X999: INVALID

  A100: VALID

  Z000: INVALID

  G700: VALID

In real applications, valid_codes_list might contain thousands or millions of entries loaded from a database or configuration file. If you needed to validate millions of user inputs against this list, using a set instead of a list could reduce validation time from hours to seconds.

TIP

If you need to check membership multiple times against the same collection, convert it to a set first. The upfront cost of conversion is quickly recovered through faster lookups. Even just a few dozen lookups can justify the conversion.

The code below has a bug related to membership testing. The developer tried to use the in operator with a list literal instead of a set, losing the O(1) performance advantage. Fix it to use a set.

Debug Challenge

> This code checks membership using a list, which requires scanning every element. Switching to a set gives O(1) lookups instead of O(n).

Functional but slow: list uses O(n) lookup instead of O(1)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99



valid = ["admin", "editor", "viewer"]
role = "editor"
print(role in valid)
valid = ["admin", "editor", "viewer"]
role = "editor"
print(role in valid)

Converting a list to a set is one of the most common and impactful performance optimizations in Python. The change is a single word in the source code, but it can reduce the time complexity of membership checks from O(n) to O(1), making code that scanned thousands of items per check effectively instant.

Sets work for membership testing because hashing gives each element a predictable storage address. When you check item in my_set, Python computes the hash of the item and checks one location directly, without scanning any other elements.

TIP

If you need to validate user input against a known list of allowed values, define that list as a set literal from the start: ALLOWED = {"admin", "editor", "viewer"}. This is cleaner and faster than building the set at runtime.

What Can Be in a Set?

Daily Life

Interviews

Identify which types sets accept

Not everything can be an element of a set. Set elements must be hashable, which generally means they must be immutable (unchangeable after creation). This requirement exists because sets use hashing to organize elements internally. If an element could change after being added, the set would not be able to find it anymore because its hash value would be different.

int / floatstrtuplebool/Nonefrozenset

int / float

Numbers

42, 3.14, -17 are valid

str

Strings

"hello" and "" both work

tuple

Tuples

(1, 2) if contents hash

bool/None

Singletons

True, False, None work

frozenset

Frozen Sets

Immutable set variant

Mutable types like lists, dictionaries, and regular sets cannot be set elements because their hash values would change if modified. Python raises a TypeError if you try to add an unhashable type to a set.

1	# Valid set elements
2	valid_set = {42, "hello", 3.14, True, None, (1, 2, 3)}
3	print("Valid set:", valid_set)
4
5	# Tuples make excellent set elements for storing pairs
6	coordinates = {(0, 0), (1, 0), (0, 1), (1, 1), (0, 0)}
7	print("Coordinate set:", coordinates)
8	print("Number of unique points:", len(coordinates))

>>>Output

Valid set: {True, 42, 3.14, (1, 2, 3), 'hello', None}

Coordinate set: {(0, 0), (1, 0), (0, 1), (1, 1)}

Number of unique points: 4

Notice that the coordinate set shows only four points even though we specified five. The point (0, 0) was specified twice but only appears once because sets eliminate duplicates. Tuples are hashable (as long as their contents are hashable), making them ideal for storing coordinate pairs, database keys, or any immutable combination of values.

1	my_set = set()
2
3	# Test which types can be added to a set
4	for item in [42, "hello", (1, 2), True]:
5	my_set.add(item)
6	print(f"Added {item!r} - set is now: {my_set}")
7
8	# These would raise TypeError:
9	try:
10	my_set.add([1, 2, 3])
11	except TypeError as e:
12	print(f"Cannot add list: {e}")

>>>Output

Added 42 - set is now: {42}

Added 'hello' - set is now: {42, 'hello'}

Added (1, 2) - set is now: {42, (1, 2), 'hello'}

Added True - set is now: {42, (1, 2), 'hello'}

Cannot add list: unhashable type: 'list'

Notice that True was "added" but the set size did not change. This is because Python considers True and 1 to be equal (and they have the same hash). Since 42 is already in the set and True equals 1 not 42, True just maps to the same slot. Mutable types like lists trigger a TypeError immediately.

Lists are mutable

You can change a list after creation, so its content is not fixed

Hash would change

A modified list would produce a different hash value than the original

Lookup breaks

The set could not find the element at its old hash-based location

Python forbids it

Mutable objects are rejected from sets to maintain data integrity

1	# If you need to store a list-like collection in a set,
2	# convert it to a tuple first
3	data_points = [[1, 2], [3, 4], [1, 2], [5, 6]]
4
5	# Convert each list to a tuple
6	unique_points = {tuple(point) for point in data_points}
7	print("Unique points as tuples:", unique_points)

>>>Output

Unique points as tuples: {(1, 2), (3, 4), (5, 6)}

Common Mistakes to Avoid

Even experienced Python programmers sometimes make mistakes when working with sets. Learning about these common pitfalls helps you avoid them and write more robust code.

Mistake 1: {} vs set()

The most common set mistake is trying to create an empty set with empty curly braces {}. Python interprets this as an empty dictionary, not an empty set. This mistake often leads to AttributeError exceptions later when you try to use set methods.

•Wrong

empty = {}
Creates a dictionary!
type(empty) returns dict
empty.add("x") raises AttributeError

•Correct

empty = set()
Creates a set!
type(empty) returns set
empty.add("x") works correctly

Mistake 2: Expecting Order

Sets are unordered. Do not write code that assumes elements will appear in any particular order when you iterate over a set or print it. Even if elements seem to appear in a consistent order during testing, this order can change between Python versions, between different runs of your program, or when the set grows or shrinks.

1	# Order is NOT guaranteed
2	letters = set()
3	letters.add("c")
4	letters.add("a")
5	letters.add("b")
6
7	print("Set contents:", letters)
8	print("Elements in iteration order:")
9	for letter in letters:
10	print(f" {letter}")
11

>>>Output

Set contents: {'a', 'b', 'c'}

Elements in iteration order:

a

b

c

In this example, we added elements in the order c, a, b, but they might appear differently when printed. If you need elements in a specific order, sort them explicitly or use a list instead.

Mistake 3: Indexing Sets

You cannot access set elements by index. Sets have no concept of "first element" or "element at position 2" because they have no order. Trying to index a set with square brackets raises a TypeError.

1	colors = {"red", "green", "blue"}
2
3	# Trying to index a set raises TypeError
4	try:
5	print(colors[0])
6	except TypeError as e:
7	print(f"Error: {e}")
8
9	# Sort to list for indexing
10	colors_list = sorted(colors)
11	print("Sorted list:", colors_list)
12	print("First alphabetically:", colors_list[0])

>>>Output

Error: 'set' object is not subscriptable

Sorted list: ['blue', 'green', 'red']

First alphabetically: blue

Using sorted() gives you a predictable ordering every time, unlike converting to an unsorted list where the order could vary. If you need indexed access often, store your data in a list instead of a set.

Mistake 4: In-Loop Mutation

Adding or removing elements from a set while iterating over it can cause unexpected behavior or RuntimeError exceptions. If you need to modify a set based on its contents, iterate over a copy instead.

1	numbers = {1, 2, 3, 4, 5, 6}
2	print("Before:", numbers)
3
4	# CORRECT: Iterate over a copy
5	for n in numbers.copy():
6	if n % 2 == 0:
7	numbers.remove(n)
8
9	print("Odd numbers only:", numbers)
10
11	# ALTERNATIVE: Set comprehension
12	numbers2 = {1, 2, 3, 4, 5, 6}
13	odds = {n for n in numbers2 if n % 2 != 0}
14	print("Using comprehension:", odds)

>>>Output

Before: {1, 2, 3, 4, 5, 6}

Odd numbers only: {1, 3, 5}

Using comprehension: {1, 3, 5}

If you need to access elements by position, use a list. If you need uniqueness and fast membership testing, use a set. Sometimes you need both: maintain a list for ordered access and a set for fast lookups.

Try fixing the buggy code below. The programmer accidentally used curly braces to create what they thought was an empty set.

Debug Challenge

> This code uses {} to create what it thinks is an empty set, but Python interprets {} as an empty dictionary. The .add() call then fails.

AttributeError: 'dict' object has no attribute 'add'

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99



tags = {}
tags.add("python")
print(tags)
tags = {}
tags.add("python")
print(tags)

Practical Examples

Let us look at some practical examples that demonstrate how sets solve real problems more elegantly than other approaches.

When to Reach for a Set

You need to count unique items from a collection with duplicates
You are validating input against a list of allowed values
You want to find duplicates by comparing list and set lengths
You are tracking which items you have already processed
You need fast membership testing across many repeated lookups

Example 1: Duplicate Values

Detecting whether a list contains duplicates is straightforward with sets: if the set length is less than the list length, duplicates exist.

1	def has_duplicates(items):
2	"""Return True if items contains any duplicates."""
3	return len(items) != len(set(items))
4
5	# Test with various lists
6	list1 = [1, 2, 3, 4, 5]
7	list2 = [1, 2, 3, 2, 5]
8	list3 = ["apple", "banana", "apple"]
9
10	print(f"List {list1} has duplicates: {has_duplicates(list1)}")
11	print(f"List {list2} has duplicates: {has_duplicates(list2)}")
12	print(f"List {list3} has duplicates: {has_duplicates(list3)}")

>>>Output

List [1, 2, 3, 4, 5] has duplicates: False

List [1, 2, 3, 2, 5] has duplicates: True

List ['apple', 'banana', 'apple'] has duplicates: True

Example 2: Validating Input

Sets are perfect for validating that user input belongs to a set of allowed values.

1	# Define valid options as a set
2	VALID_SIZES = {"small", "medium", "large", "xl"}
3	VALID_COLORS = {"red", "blue", "green", "black", "white"}
4
5	def validate_order(size, color):
6	"""Validate that size and color are valid options."""
7	errors = []
8
9	if size.lower() not in VALID_SIZES:
10	errors.append(f"Invalid size: {size}")
11
12	if color.lower() not in VALID_COLORS:
13	errors.append(f"Invalid color: {color}")
14
15	return errors
16
17	# Test validation
18	print(validate_order("Medium", "Blue"))
19	print(validate_order("XL", "Purple"))
20	print(validate_order("Huge", "Orange"))

>>>Output

[]

["Invalid color: Purple"]

["Invalid size: Huge", "Invalid color: Orange"]

Example 3: Tracking Items

When processing a stream of data, use a set to efficiently track which items you have already seen.

1	# Process log entries and track unique errors
2	log_entries = [
3	"INFO: User logged in",
4	"ERROR: Database connection failed",
5	"INFO: Data loaded",
6	"ERROR: File not found",
7	"ERROR: Database connection failed",
8	"WARNING: Memory usage high",
9	"ERROR: Database connection failed",
10	]
11
12	# Track unique errors
13	unique_errors = set()
14	for entry in log_entries:
15	if entry.startswith("ERROR:"):
16	unique_errors.add(entry)
17
18	print(f"Total log entries: {len(log_entries)}")
19	print(f"Unique error types: {len(unique_errors)}")
20	print("Unique errors:")
21	for error in unique_errors:
22	print(f" {error}")

>>>Output

Total log entries: 7

Unique error types: 2

Unique errors:

  ERROR: Database connection failed

  ERROR: File not found

Sets provide a powerful way to work with unique collections and perform membership tests efficiently. Put these fundamentals to the test with hands-on challenges in the Python Builder.

❯❯❯PUTTING IT ALL TOGETHER

> You are a data analyst at Mailchimp deduplicating email addresses collected from three separate campaign upload files before running a bulk re-engagement send, ensuring no subscriber receives the same message twice and that every address meets basic hashability requirements.

set() created from the first campaign list automatically removes any duplicate addresses already present in that single source file.

.add() merges each address from the second and third campaign files into the existing set without any risk of introducing duplicates.

The in operator checks whether a specific email was already captured before deciding whether to include it from a new source list.

The set's uniqueness guarantee means the final address list passed to Mailchimp's send API contains no repeated recipient emails.

KEY TAKEAWAYS

Sets are unordered collections that automatically eliminate duplicates

Create sets with curly braces {1, 2, 3} or use set() for empty sets

Empty curly braces {} creates a dictionary, not a set

Convert lists to sets to remove duplicates: set(my_list)

.add() adds one element; .update() adds multiple from an iterable

.remove() raises KeyError if missing; .discard() is silent

Membership testing with in is O(1) - extremely fast regardless of set size

Set elements must be hashable (immutable): strings, numbers, tuples

Lists, dictionaries, and sets cannot be elements of sets

Do not modify a set while iterating over it; iterate over a copy instead

Collections that guarantee uniqueness

Category: Python
Difficulty: beginner
Duration: 55 minutes
Challenges: 3 hands-on challenges

Topics covered: What is a Set?, Creating Sets, Automatic Duplicate Removal, Removing Elements from Sets, What Can Be in a Set?

Lesson Sections

What is a Set? (concepts: pySets)
A set is an unordered collection of unique elements. These two properties define what makes a set different from other collection types like lists and tuples. Understanding both properties is essential for using sets correctly. The word "unordered" means that sets do not maintain any particular sequence for their elements. Unlike lists, where the first item you add stays first and the last item stays last, sets make no guarantees about element order. When you iterate over a set or print it, the
Creating Sets
Python provides two main ways to create sets. You can use curly braces with elements inside, similar to how you write dictionary literals but without key-value pairs. Alternatively, you can use the set() constructor function, which can convert other iterables into sets. Each approach has specific use cases and limitations that you should understand. Using Curly Braces The most common and concise way to create a set with initial elements is using curly braces. Place your elements inside the brace
Automatic Duplicate Removal
The automatic duplicate removal behavior of sets is one of their most powerful and useful features. Sets eliminate duplicates both during creation and when adding new elements. This happens silently, without errors or warnings. Understanding this behavior allows you to write cleaner, more concise code. Even though we specified "Alice" three times and "Bob" twice in the set literal, the resulting set contains each name exactly once. Python processes the elements in order, adding each one to the s
Removing Elements from Sets
remove() vs discard() Both approaches handle missing elements gracefully. The if-check approach is explicit, while discard() handles it silently. Choose based on whether you want your code to acknowledge the absence or ignore it entirely. Try choosing different removal methods below to see how each one behaves when the element is missing from the set. The pop() Method The exact order in which elements are popped depends on Python's internal implementation and can vary between different runs or P
What Can Be in a Set?
Not everything can be an element of a set. Set elements must be hashable, which generally means they must be immutable (unchangeable after creation). This requirement exists because sets use hashing to organize elements internally. If an element could change after being added, the set would not be able to find it anymore because its hash value would be different. Notice that the coordinate set shows only four points even though we specified five. The point (0, 0) was specified twice but only app