Friendster collapsed under its own success because it stored friend relationships in arrays and performed full scans to check connections, making the site grind to a halt as its user base grew. Redis chose hash maps as its core data structure, achieving constant-time key lookups that scale to millions of queries per second without a performance cliff. The data structure you choose in the first week of a project determines whether your application thrives or crashes under growth. This lesson teaches you how to match the right data structure to the right problem before it becomes a production crisis.
Lists: Ordered Collections
Daily Life
Interviews
Store and retrieve ordered data
Lists are Python's workhorse data structure. They hold items in a specific order, allow duplicates, and can grow or shrink as needed. When you receive a batch of records from an API, process rows from a CSV file, or collect results from a database query, you typically work with lists. Lists are by far the most commonly used data structure in Python.
What makes lists so versatile is their flexibility. They can hold any type of data - numbers, strings, other lists, dictionaries, or custom objects. You can mix types within the same list, though in practice keeping types consistent makes code easier to understand and maintain.
Create a list using square brackets []. Items are separated by commas. The order you specify is the order they are stored, and that order is preserved throughout the life of the list.
1
# A list of transaction amounts
2
transactions=[150.00,89.99,42.50,200.00,89.99]
3
print("Transactions:",transactions)
4
print("Count:",len(transactions))
5
6
print("First transaction:",transactions[0])
7
print("Last transaction:",transactions[-1])
>>>Output
Transactions: [150.0, 89.99, 42.5, 200.0, 89.99]
Count: 5
First transaction: 150.0
Last transaction: 89.99
Indexing is how you access individual elements. The first element is at index 0, the second at index 1, and so on. Python also supports negative indexing: -1 refers to the last element, -2 to the second-to-last, and so on. This is incredibly useful when you need to access elements from the end of a list without knowing its length.
Lists are mutable, meaning you can modify them after creation. You can append new items, insert at specific positions, remove items, and change existing values. This flexibility makes lists ideal for building up results incrementally, which is a common pattern in data processing.
1
# Building a list of processed records
2
processed_ids=[]
3
4
# Simulate processing incoming data
5
forrecord_idin[101,102,103]:
6
processed_ids.append(record_id)
7
print(f"Processed record {record_id}")
8
9
print("All processed:",processed_ids)
>>>Output
Processed record 101
Processed record 102
Processed record 103
All processed: [101, 102, 103]
The append method adds an item to the end of the list. This is one of the most common operations you will perform. Lists grow dynamically - you do not need to specify a size upfront, and Python handles the memory management for you. This makes lists perfect for situations where you do not know in advance how many items you will have.
1
# Slicing: extract portions of a list
2
metrics=[10,20,30,40,50,60,70]
3
4
# Get first three elements
5
print("First three:",metrics[0:3])
6
7
# Get elements from index 2 to 5
8
print("Middle:",metrics[2:5])
9
10
# Get last three elements
11
print("Last three:",metrics[-3:])
12
13
# Skip every other element
14
print("Every other:",metrics[::2])
>>>Output
First three: [10, 20, 30]
Middle: [30, 40, 50]
Last three: [50, 60, 70]
Every other: [10, 30, 50, 70]
Slicing is a powerful feature that lets you extract portions of a list using the syntax list[start:stop:step]. The start index is included, but the stop index is excluded. If you omit start, it defaults to the beginning. If you omit stop, it goes to the end. The optional step parameter lets you skip elements.
Fill in the Blank
> You have a list of five metrics [10, 20, 30, 40, 50] and need to extract the middle three values. Pick the slice that captures exactly those elements.
metrics = [10, 20, 30, 40, 50]
print(metrics)
Lists are the go-to choice whenever your data has a natural ordering. Here are the scenarios where lists outperform every other option.
Order matters
First in, first out processing where sequence is meaningful
Duplicates allowed
Track every occurrence, like repeated transactions or events
Dynamic size
Items will be added or removed as your program runs
Position-based access
Retrieve the 1st, 5th, or last item by numeric index instantly
Incremental building
Append results one at a time inside a processing loop
Essential List Methods
Lists come with a rich set of built-in methods for manipulation. Understanding these methods lets you write cleaner, more efficient code. Here are the most commonly used methods that you will encounter in almost every Python project.
1
scores=[85,92,78,95,88]
2
3
# Add elements
4
scores.append(90)
5
scores.insert(0,100)
6
print("After adding:",scores)
7
8
# Remove elements
9
scores.remove(78)
10
last=scores.pop()
11
print("After removing:",scores)
12
print("Popped value:",last)
13
14
# Find and count
15
print("Index of 95:",scores.index(95))
16
print("Count of 92:",scores.count(92))
>>>Output
After adding: [100, 85, 92, 78, 95, 88, 90]
After removing: [100, 85, 92, 95, 88]
Popped value: 90
Index of 95: 3
Count of 92: 1
Python Quiz
> Add an element to the end of a list, then remove and capture the last element. Pick the method that grows the list and the one that shrinks it while returning the removed value.
Understanding performance helps you write efficient code. Lists excel at accessing items by index and appending to the end - both operations happen in O(1) constant time, meaning they are equally fast regardless of list size. However, searching for a specific value requires checking each item one by one, which becomes slow for large lists.
Inserting or removing from the beginning or middle of a list is slow because all subsequent elements must be shifted. If you frequently need to add or remove from both ends, consider using a deque from the collections module instead.
•Fast Operations
Access by index: list[0]
Append to end: list.append(x)
Pop from end: list.pop()
Get length: len(list)
•Slow Operations
Search: x in list
Insert at start: list.insert(0, x)
Remove by value: list.remove(x)
Insert in middle
TIP
If you frequently need to check whether an item exists in a collection, consider using a set instead of a list. Sets are optimized for membership testing and can check membership in constant time.
The implementation of Python lists explains why some operations are fast and others are not.
Tuples: Immutable Sequences
Daily Life
Interviews
Lock down data that must not change
Tuples look similar to lists but have one critical difference: they cannot be changed after creation. Once you create a tuple, you cannot add, remove, or modify its elements. This immutability is not a limitation - it is a feature that makes your code safer and more predictable.
Think about data that should never change: database connection parameters, geographic coordinates, RGB color values, or API response codes. If you accidentally modify such data, bugs can be extremely difficult to track down. Tuples prevent this entire category of bugs by making modification impossible.
Create a tuple using parentheses () or just commas. Access elements the same way as lists, using square bracket indexing. You can iterate over tuples, slice them, and use all the read-only operations that work on lists.
1
# Database connection settings (should never change)
Notice that we still use square brackets to access tuple elements, just like lists. The difference is only in what operations are allowed. Reading is fine; writing is forbidden. Attempting to modify a tuple raises a TypeError, and Python enforces this at runtime.
1
coordinates=(10,20)
2
# TypeError: 'tuple' object does not support item assignment
3
coordinates[0]=15
4
5
# If you need to "change" a tuple, create a new one
6
new_coords=(15,coordinates[1])
If you find yourself needing to "modify" a tuple, the solution is to create a new tuple with the desired values. This pattern is common in functional programming and ensures that any code holding a reference to the original tuple is not affected by your changes.
Tuples Instead of Lists?
Immutability provides several important benefits that make tuples worth using even though lists are more flexible. Understanding these benefits helps you make informed decisions about which structure to use.
First, immutability prevents bugs. If data should not change, making it a tuple ensures it cannot change accidentally - not by your code, not by library code, not by anyone. This is a form of defensive programming that catches errors at runtime rather than letting them silently corrupt your data.
Second, tuples are hashable, which means they can be used as dictionary keys. Lists cannot be dictionary keys because they are mutable - if you could change a list after using it as a key, the dictionary would become corrupted. This makes tuples essential for certain data structures.
Third, tuples are slightly more memory-efficient and faster to create than lists. For small sequences that you create many times, this can add up. Python can also optimize certain operations on tuples because it knows they will not change.
1
# Tuples as dictionary keys (lists cannot do this)
One of the most elegant features of tuples is unpacking: assigning tuple elements to multiple variables in a single statement. This makes code cleaner and more readable, especially when functions return multiple values. Unpacking is so useful that Python developers use it constantly.
The number of variables on the left must match the number of elements in the tuple. Python also supports extended unpacking with the star operator, letting you capture multiple elements into a list while unpacking the rest into individual variables.
1
# Unpacking a tuple into separate variables
2
user_record=("alice_42","alice@example.com",28)
3
username,email,age=user_record
4
5
print(f"User: {username}")
6
print(f"Email: {email}")
7
print(f"Age: {age}")
8
9
defget_min_max(numbers):
10
return(min(numbers),max(numbers))
11
12
data=[45,23,67,12,89,34]
13
minimum,maximum=get_min_max(data)
14
print(f"Range: {minimum} to {maximum}")
>>>Output
User: alice_42
Email: alice@example.com
Age: 28
Range: 12 to 89
Tuple unpacking is especially common when iterating over dictionary items or when working with functions that return multiple values. The enumerate function, for example, returns tuples of (index, value) that you typically unpack in a for loop.
Immutability also makes tuples safe to share across threads without locking. Because no code can modify a tuple after creation, multiple threads can read the same tuple simultaneously without risk of data corruption.
When designing functions that return multiple values, tuples are the idiomatic choice. Functions like min(), max(), divmod(), and many standard library functions return tuples that you unpack at the call site.
Dicts: Key-Value Storage
Daily Life
Interviews
Look up any value by key instantly
Dictionaries are one of Python's most powerful and frequently used data structures. They store key-value pairs, allowing you to look up values by their keys instantly. Think of a dictionary like a real dictionary: you look up a word (key) to find its definition (value). The difference is that Python dictionaries can use almost any immutable type as a key, not just strings.
In data engineering, dictionaries are absolutely everywhere. JSON responses from APIs are dictionaries. Configuration files parse into dictionaries. Database rows are often represented as dictionaries. Caches use dictionaries. Environment variables are accessed through dictionaries. Mastering dictionaries is essential for any Python developer.
Create a dictionary using curly braces {} with key-value pairs separated by colons. Keys must be immutable (strings, numbers, or tuples), while values can be anything - including other dictionaries, lists, or custom objects.
1
# User profile from an API response
2
user={
3
"user_id":"u_12345",
4
"name":"Sarah Chen",
5
"email":"sarah@example.com",
6
"is_premium":True,
7
"login_count":47
8
}
9
10
print("Name:",user["name"])
11
print("Premium status:",user["is_premium"])
12
print("Total keys:",len(user))
>>>Output
Name: Sarah Chen
Premium status: True
Total keys: 5
Notice how the dictionary uses descriptive string keys like "user_id" and "name" instead of numeric indices. This makes your code self-documenting - you can tell exactly what each value represents just by looking at its key. Compare user["email"] to user[2] - the dictionary version is much clearer.
The magic of dictionaries is constant-time lookup. Whether your dictionary has 10 items or 10 million, finding a value by its key takes the same amount of time. This is fundamentally different from lists, where searching requires checking each item one by one. This performance characteristic makes dictionaries ideal for building lookup tables and caches.
1
# Building a lookup table for fast access
2
# Map product IDs to prices
3
price_lookup={
4
"SKU001":29.99,
5
"SKU002":49.99,
6
"SKU003":19.99,
7
"SKU004":99.99,
8
}
9
10
# Instant lookup - no searching required
11
product_id="SKU003"
12
ifproduct_idinprice_lookup:
13
print(f"Price for {product_id}: {price_lookup[product_id]}")
>>>Output
Price for SKU003: 19.99
This lookup table pattern is extremely common. Instead of searching through a list of products to find the price for SKU003, we go directly to it using the key. In a production system processing millions of lookups, this difference between constant time and linear search time is the difference between a responsive application and a slow one.
Iterating Over Dictionaries
You often need to loop through dictionary contents. Python provides several ways to iterate: over just keys, just values, or both key-value pairs together. The items() method returns both, which is usually what you want.
1
metrics={"cpu":45.2,"memory":72.8,"disk":58.1}
2
3
# Iterate over keys (default behavior)
4
print("Metrics tracked:")
5
formetric_nameinmetrics:
6
print(f" - {metric_name}")
7
8
# Iterate over both keys and values
9
print("\nCurrent values:")
10
forname,valueinmetrics.items():
11
print(f" {name}: {value}%")
>>>Output
Metrics tracked:
- cpu
- memory
- disk
Current values:
cpu: 45.2%
memory: 72.8%
disk: 58.1%
Modifying Dictionaries
Dictionaries are mutable. You can add new key-value pairs, update existing values, and remove entries. Adding a new key or updating an existing one uses the same syntax: assignment with square brackets. If the key exists, the value is updated; if not, a new entry is created.
1
metrics={"requests":1000,"errors":5}
2
print("Initial:",metrics)
3
4
# Add new key
5
metrics["latency_ms"]=45.2
6
print("After adding:",metrics)
7
8
# Update existing key
9
metrics["requests"]=1050
10
print("After update:",metrics)
11
12
# Remove a key
13
delmetrics["errors"]
14
print("After delete:",metrics)
>>>Output
Initial: {'requests': 1000, 'errors': 5}
After adding: {'requests': 1000, 'errors': 5, 'latency_ms': 45.2}
After update: {'requests': 1050, 'errors': 5, 'latency_ms': 45.2}
After delete: {'requests': 1050, 'latency_ms': 45.2}
Safe Key Access
Accessing a key that does not exist raises a KeyError. To handle missing keys gracefully, use the get() method, which returns None (or a default value you specify) instead of raising an error.
1
config={"host":"localhost","port":8080}
2
3
# Risky: raises KeyError if key missing
4
# timeout = config["timeout"] # KeyError!
5
6
# Safe: returns None if key missing
7
timeout=config.get("timeout")
8
print("Timeout:",timeout)
9
10
# Safe with default value
11
timeout=config.get("timeout",30)
12
print("Timeout with default:",timeout)
>>>Output
Timeout: None
Timeout with default: 30
TIP
Always use .get() when a key might not exist. It prevents crashes and makes your code more robust against unexpected data.
Nested Dictionaries
Dictionary values can be any type, including other dictionaries. This creates nested structures, which are extremely common when working with JSON data from APIs. Accessing nested values requires chaining bracket notation or using multiple get() calls for safety.
The chained get() pattern is verbose but safe. Each get() returns an empty dictionary if the key is missing, allowing the chain to continue without raising an error. The final get() returns the default value if the entire path does not exist.
Fill in the Blank
> You have a user dictionary with a nested "profile" containing "name". Pick the access method and fallback that safely navigates the nested structure.
user = {"profile": {"name": "Alice"}}
result = user("profile", )("name", "unknown")
print(result)
Dictionaries are the right tool whenever you need to associate keys with values. These are the most common patterns.
LOOKUPSTRUCTCOUNTCACHEJSON
LOOKUP
ID-based access
Find users by ID instantly
STRUCT
Named fields
Organized data with keys
COUNT
Tally items
Count word occurrences
CACHE
Store results
Reuse computed answers
JSON
Parse configs
Read structured API data
Dictionaries shine when your data needs meaningful labels. They make code self-documenting: user["email"] communicates intent far more clearly than user[1], especially when reviewing code written months ago.
Sets: Unique Collections
Daily Life
Interviews
Eliminate duplicates and test membership
Sets are unordered collections of unique elements. When you add a duplicate to a set, it simply ignores it - no error, no warning, just silent deduplication. This makes sets perfect for eliminating duplicates, tracking unique visitors, and performing mathematical set operations like unions and intersections.
Unlike lists and tuples, sets do not maintain any particular order. The elements are stored based on their hash values, which optimizes for fast operations rather than sequence. This trade-off is worth it because sets excel at two critical operations: adding elements and checking if an element exists.
Create a set using curly braces {} with elements (not key-value pairs) or the set() function. Important gotcha: {} alone creates an empty dictionary, not an empty set. Use set() for an empty set.
Notice that the order of elements in a set is not guaranteed. Sets prioritize fast operations over maintaining insertion order. When we created the set with duplicate user IDs, the duplicates were automatically removed, leaving only three unique values. This automatic deduplication is incredibly useful when processing data.
If you need both uniqueness and order, you can use a dictionary with None values (since Python 3.7+, dictionaries maintain insertion order), or use the dict.fromkeys() pattern which preserves first-occurrence order.
Fast Membership Testing
The killer feature of sets is O(1) membership testing. Checking if an element exists in a set takes the same time whether the set has 100 elements or 100 million. This makes sets absolutely essential for any operation involving "is this item in my collection?" - a question that comes up constantly in data processing.
Consider a scenario where you need to filter a million records, keeping only those that match a list of 10,000 valid IDs. With a list, each of the million records requires checking up to 10,000 IDs - that is potentially 10 billion comparisons. With a set, each record requires exactly one lookup, giving you one million operations total. The difference can be hours versus seconds.
Sets support mathematical operations that are incredibly useful for data analysis. These operations answer common questions: Which items are in either collection? Which are in both? Which are in one but not the other? Python makes these operations fast and intuitive.
Union combines two sets, giving you all unique elements from both. Intersection finds elements that appear in both sets. Difference finds elements that are in the first set but not the second. These operations run in O(n) time, making them efficient for large datasets.
All viewers: {'alice', 'frank', 'charlie', 'eve', 'diana', 'bob'}
Viewed both: {'diana', 'bob'}
Electronics only: {'alice', 'charlie'}
These operations are perfect for analyzing user behavior, comparing data sets, or finding overlaps between categories. In data engineering, you might use intersection to find customers who appear in both yesterday's and today's data, or difference to find new customers who did not exist yesterday.
Python Quiz
> Find how many viewers visited both product categories and how many visited at least one. Pick the set operator for each question.
Set operations are often the most readable way to express data comparisons. Instead of nested loops that check membership, a single operator like & or - communicates the intent immediately and executes much faster.
The difference operator (-) is not symmetric. A - B and B - A give different results. Always think about which set should be the "reference" set and which one you are subtracting from it.
TIP
When creating an empty set, always use set() not {}. Python interprets {} as an empty dictionary, which will cause an AttributeError when you try to call set methods like .add().
Debug Challenge
> This code uses {} to create an empty set, but Python interprets {} as an empty dictionary. Calling .add() on a dict raises an AttributeError.
AttributeError: 'dict' object has no attribute 'add'
^ Symmetric difference - elements in either but not both
Sets are the right choice whenever your data has no meaningful order and you care about uniqueness. Converting a list to a set is a single operation that deduplicates every element instantly.
frozensets are the immutable equivalent of sets. They support all the same operations as sets but cannot be modified after creation, making them hashable and usable as dictionary keys.
Choosing the Right Type
Daily Life
Interviews
Match any problem to its best structure
Selecting the right data structure is one of the most important skills in programming. The choice affects code clarity, performance, and correctness. A well-chosen data structure makes code simpler and faster. A poorly chosen one leads to complex workarounds and performance problems.
The good news is that choosing becomes intuitive with practice. After working with these four structures for a while, you will instinctively know which one fits each situation. Until then, use a systematic decision framework to guide your choices.
The Decision Framework
Start by asking these questions about your data: Does order matter? Are duplicates allowed? Do you need to look up items by a key? Should the data be modifiable after creation? Your answers point directly to the right structure.
Think about what operations you will perform most frequently. If you primarily access by index, use a list. If you primarily access by key, use a dictionary. If you primarily check membership, use a set. If you need the data to be immutable, use a tuple.
01
list
Ordered and changeable - use when sequence matters and items may be added or removed
02
tuple
Ordered and frozen - use when data must never change after creation
03
dict
Key-value pairs - use when you need to look up values by a unique identifier
04
set
Unique elements only - use when duplicates are unwanted and you need fast membership checks
Real-World Scenarios
Let us apply this framework to common data engineering scenarios. In each case, the data requirements point clearly to one structure. With practice, you will recognize these patterns instantly and choose the right structure without conscious thought.
1
# Scenario 1: Processing a queue of tasks
2
# Need: ordered, will remove items as processed
3
task_queue=["validate","transform","load"]
4
5
# Scenario 2: Database connection parameters
6
# Need: unchangeable configuration
7
db_config=("prod-server",5432,"main_db")
8
9
# Scenario 3: User session data
10
# Need: lookup by session ID
11
sessions={
12
"sess_abc":{"user":"alice","expires":3600},
13
"sess_xyz":{"user":"bob","expires":1800}
14
}
15
16
# Scenario 4: Track processed record IDs
17
processed_ids={1001,1002,1003}
18
19
print("Task queue:",task_queue)
20
print("DB config:",db_config)
21
print("Sessions:",len(sessions),"active")
22
print("Processed:",len(processed_ids),"records")
>>>Output
Task queue: ['validate', 'transform', 'load']
DB config: ('prod-server', 5432, 'main_db')
Sessions: 2 active
Processed: 3 records
Data Pipeline Patterns
In data engineering, you often combine multiple structures. Here is a pattern you will see frequently: using a set for deduplication while building a list of results.
This set-plus-list pattern appears everywhere in data processing code. You will use it to deduplicate user events, filter records to unique entries, or ensure that each item is processed exactly once. The key insight is that sets are perfect for tracking what you have seen, while lists are perfect for storing the results in order.
TIP
This pattern combines the fast membership testing of sets with the ordered collection of lists. It is O(n) overall instead of O(n²) that you would get from checking uniqueness with only a list.
Common Mistakes
Even experienced developers make these mistakes. Learning to recognize and avoid them will save you hours of debugging. Some cause immediate errors, while others silently produce incorrect results or slow performance.
✓Do
Use set() for membership tests
Use .get() for safe dict access
Build new lists with comprehensions
Use tuples for fixed data
✗Don't
Search large lists with "in"
Access dict keys with [] blindly
Modify a list while looping it
Use {} for an empty set
Lists for Membership Tests
One of the most common performance mistakes is using a list when you need frequent membership tests. For small lists with a few dozen items, this is fine - the performance difference is negligible. But for thousands or millions of items, the difference becomes dramatic and can make your code unusably slow.
The symptom is code that works correctly but takes far longer than it should. If you find yourself waiting minutes for something that should take seconds, check whether you are doing membership tests against a list. Converting to a set is often all it takes to fix the problem.
•Wrong
blocked = ["ip1", "ip2", ...]
if ip in blocked: # Slow!
Checks every item
•Correct
blocked = {"ip1", "ip2", ...}
if ip in blocked: # Fast!
Instant hash lookup
Empty Dict vs Empty Set
A common gotcha: {} creates an empty dictionary, not an empty set. To create an empty set, use set().
1
# WRONG - this creates a dict, not a set!
2
wrong={}
3
print("Type of {}:",type(wrong))
4
5
# CORRECT - use set() for empty set
6
correct=set()
7
print("Type of set():",type(correct))
8
9
# Non-empty sets use curly braces fine
10
also_correct={1,2,3}
11
print("Type of {1,2,3}:",type(also_correct))
>>>Output
Type of {}: <class 'dict'>
Type of set(): <class 'set'>
Type of {1,2,3}: <class 'set'>
Modifying Tuples
Forgetting that tuples are immutable leads to TypeError. If you need to "change" a tuple, you must create a new one with the desired values.
1
# WRONG - tuples cannot be modified
2
coords=(10,20)
3
# TypeError!
4
coords[0]=15
5
6
# CORRECT - create a new tuple
7
coords=(15,coords[1])
KeyError on Missing Keys
Accessing a dictionary key that does not exist crashes your program with a KeyError. This is one of the most common runtime errors in Python. When processing external data like JSON from APIs or user input, you can never be certain all expected keys are present.
The solution is defensive coding. Always use the .get() method when a key might be missing, or check for key existence with the in operator before accessing. The .get() method is usually cleaner because it returns a default value in a single expression.
1
data={"name":"Alice","age":30}
2
3
# WRONG - crashes if key missing
4
# data["city"] # KeyError
5
6
city=data.get("city","Unknown")
7
print("City:",city)
8
9
# ALSO CORRECT - check first
10
if"city"indata:
11
print(data["city"])
12
else:
13
print("City not provided")
>>>Output
City: Unknown
City not provided
Mutating During Iteration
Modifying a list while looping over it causes unpredictable behavior. Elements get skipped or processed multiple times because the loop indices become misaligned with the changing list. This is a notorious source of subtle bugs.
1
# WRONG - modifying while iterating
2
numbers=[1,2,3,4,5]
3
forninnumbers:
4
ifn%2==0:
5
numbers.remove(n)
6
7
# CORRECT - create a new list
8
numbers=[1,2,3,4,5]
9
numbers=[nforninnumbersifn%2!=0]
10
11
# ALSO CORRECT - iterate over a copy
12
numbers=[1,2,3,4,5]
13
# [:] creates a shallow copy
14
forninnumbers[:]:
15
ifn%2==0:
16
numbers.remove(n)
The safest approach is to build a new list using a list comprehension, which is also more readable and often faster. If you must modify in place, iterate over a copy of the list using the slice notation [:] to create a shallow copy.
Fill in the Blank
> You have a list [1, 2, 3, 4, 5] and need to remove the even numbers. Pick the approach that filters safely without mutating the list during iteration.
numbers = [1, 2, 3, 4, 5]
result =
print(result)
Shallow vs Deep Copy
When you assign a list to a new variable, both variables point to the same list in memory. Modifying one affects the other. This is called aliasing, and it surprises many developers. If you want a truly independent copy, you need to explicitly create one.
1
# GOTCHA - assignment creates an alias, not a copy
2
original=[1,2,3]
3
alias=original
4
alias.append(4)
5
print("Original after alias modification:",original)
6
7
# CORRECT - create an actual copy
8
original=[1,2,3]
9
copy=original[:]
10
copy.append(4)
11
print("Original after copy modification:",original)
12
print("Copy:",copy)
>>>Output
Original after alias modification: [1, 2, 3, 4]
Original after copy modification: [1, 2, 3]
Copy: [1, 2, 3, 4]
TIP
For nested structures like lists of lists, you need a deep copy using copy.deepcopy() from the copy module. Shallow copies only copy the outer container, not the inner objects.
Understanding data structures is essential for organizing information effectively in your programs. Put these fundamentals to the test with hands-on challenges in the Python Builder.
❯❯❯PUTTING IT ALL TOGETHER
> You are a junior data engineer at Airbnb writing a pipeline that ingests a nightly batch of booking events, deduplicates listing IDs, maps each host to their revenue, and stores immutable rate-tier boundaries that must never change mid-run.
list holds the ordered sequence of incoming booking events because processing order matters and duplicate bookings for the same listing are valid.
dict maps each host ID to their accumulated revenue so per-host lookups remain O(1) regardless of how many hosts are in the batch.
set deduplicates listing IDs seen in the run so already-processed listings are skipped without an O(n) membership scan of the events list.
tuple stores the immutable rate-tier boundaries so no downstream transformation step can accidentally mutate the pricing thresholds mid-run.
KEY TAKEAWAYS
Lists are ordered, mutable, and allow duplicates - use for sequences
Tuples are ordered, immutable - use for fixed data and dict keys
Dicts map keys to values - use for lookups and structured data
Sets store unique elements - use for deduplication and fast membership
Use set instead of list for "in" checks on large collections
Use .get() for safe dictionary access when keys might be missing
Empty set is set(), not {} (which creates a dict)
Combine structures strategically: sets for deduplication, lists for order
Choose based on: order needed? duplicates allowed? key lookups? mutability?
The right container for your data
Category
Python
Difficulty
beginner
Duration
42 minutes
Challenges
0 hands-on challenges
Topics covered: Lists: Ordered Collections, Tuples: Immutable Sequences, Dicts: Key-Value Storage, Sets: Unique Collections, Choosing the Right Type
Lists are Python's workhorse data structure. They hold items in a specific order, allow duplicates, and can grow or shrink as needed. When you receive a batch of records from an API, process rows from a CSV file, or collect results from a database query, you typically work with lists. Lists are by far the most commonly used data structure in Python. What makes lists so versatile is their flexibility. They can hold any type of data - numbers, strings, other lists, dictionaries, or custom objects.
Tuples look similar to lists but have one critical difference: they cannot be changed after creation. Once you create a tuple, you cannot add, remove, or modify its elements. This immutability is not a limitation - it is a feature that makes your code safer and more predictable. Think about data that should never change: database connection parameters, geographic coordinates, RGB color values, or API response codes. If you accidentally modify such data, bugs can be extremely difficult to track d
Dictionaries are one of Python's most powerful and frequently used data structures. They store key-value pairs, allowing you to look up values by their keys instantly. Think of a dictionary like a real dictionary: you look up a word (key) to find its definition (value). The difference is that Python dictionaries can use almost any immutable type as a key, not just strings. In data engineering, dictionaries are absolutely everywhere. JSON responses from APIs are dictionaries. Configuration files
Sets are unordered collections of unique elements. When you add a duplicate to a set, it simply ignores it - no error, no warning, just silent deduplication. This makes sets perfect for eliminating duplicates, tracking unique visitors, and performing mathematical set operations like unions and intersections. Unlike lists and tuples, sets do not maintain any particular order. The elements are stored based on their hash values, which optimizes for fast operations rather than sequence. This trade-o
Selecting the right data structure is one of the most important skills in programming. The choice affects code clarity, performance, and correctness. A well-chosen data structure makes code simpler and faster. A poorly chosen one leads to complex workarounds and performance problems. The good news is that choosing becomes intuitive with practice. After working with these four structures for a while, you will instinctively know which one fits each situation. Until then, use a systematic decision