Functions: Intermediate

Django, the Python web framework that powers Spotify, Pinterest, and Disqus, exposes every view function with flexible signatures that accept *args and **kwargs, letting thousands of third-party developers add middleware, authentication, and caching to any endpoint without modifying core framework code. This flexibility is what allows a framework to serve both a 10-page startup site and Spotify's 600 million users with the same codebase. The function signature patterns you are about to learn are the foundation of that extensibility.

Default Parameters

Daily Life
Interviews

Design functions with smart fallbacks

Default parameters let you specify fallback values for function arguments. When a caller omits an argument, Python uses the default. This makes functions more flexible by allowing optional parameters without requiring the caller to always provide them.

Basic Default Values

Define defaults using parameter=value in the function signature:

1def greet(name, greeting="Hello"):
2 return greeting + ", " + name + "!"
3
4# Call with both arguments
5print(greet("Alice", "Hi"))
6
7# Call with only required argument
8print(greet("Bob"))
9
10# Call with only required argument
11print(greet("Charlie"))
>>>Output
Hi, Alice!
Hello, Bob!
Hello, Charlie!

The greeting parameter has a default value of "Hello". When you call greet("Bob"), Python automatically uses "Hello" because you did not provide a second argument. This simple mechanism unlocks tremendous flexibility in function design.

Default parameters are evaluated left to right at function definition time, not at call time. This distinction becomes important when we discuss the mutable default pitfall later in this section. For now, understand that each call either uses your provided value or falls back to the pre-defined default.

Multiple Default Parameters

Functions can have multiple default parameters. This is common in data processing functions where you want sensible defaults:
1def fetch_records(table, limit=100, offset=0, sort_by="id"):
2 query = "SELECT * FROM " + table
3 query += " ORDER BY " + sort_by
4 query += " LIMIT " + str(limit)
5 query += " OFFSET " + str(offset)
6 return query
7
8# Use all defaults
9print(fetch_records("users"))
10print()
11
12# Override just limit
13print(fetch_records("users", 50))
14print()
15
16# Override limit and offset
17print(fetch_records("orders", 25, 100))
>>>Output
SELECT * FROM users ORDER BY id LIMIT 100 OFFSET 0
 
SELECT * FROM users ORDER BY id LIMIT 50 OFFSET 0
 
SELECT * FROM orders ORDER BY id LIMIT 25 OFFSET 100
Notice how the function remains useful with minimal arguments. The caller only needs to specify the table name, and sensible defaults handle pagination and sorting. This pattern appears constantly in database utilities, API clients, and data processing libraries.
TIP
In production ETL code, default parameters are everywhere. Think of batch sizes, timeout values, retry counts, and date ranges. Good defaults make your functions immediately usable without configuration.
Here are examples of the most common default parameters you will see in data engineering code.
limit=100timeout=30retries=3verbose=Futf-8
limit=100
Pagination
Control batch size easily
timeout=30
Timeouts
Sensible wait by default
retries=3
Retry logic
Auto-retry on failure
verbose=F
Debug flags
Quiet unless turned on
utf-8
File encoding
Standard text encoding

Keyword Args with Defaults

When you have multiple defaults, use keyword arguments to skip middle parameters:
1def connect_db(host="localhost", port=5432, timeout=30, ssl=True):
2 config = "host=" + host + ", port=" + str(port)
3 config += ", timeout=" + str(timeout) + ", ssl=" + str(ssl)
4 return config
5
6# Use all defaults
7print(connect_db())
8
9# Override only port (skip host)
10print(connect_db(port=3306))
11
12# Override only ssl (skip all others)
13print(connect_db(ssl=False))
14
15# Override host and timeout (skip port)
16print(connect_db(host="prod-db.example.com", timeout=60))
>>>Output
host=localhost, port=5432, timeout=30, ssl=True
host=localhost, port=3306, timeout=30, ssl=True
host=localhost, port=5432, timeout=30, ssl=False
host=prod-db.example.com, port=5432, timeout=60, ssl=True
Keyword arguments let you specify exactly which parameters to override. This is much cleaner than passing values for every parameter just to change one. Without keyword arguments, you would need to remember the position of every parameter and provide values for all parameters before the one you want to change.

This flexibility is why Python APIs are often more readable than those in languages without keyword arguments. When you see connect_db(timeout=60), the intent is immediately clear. Compare this to connect_db("localhost", 5432, 60, True) where you need to consult documentation to understand what 60 and True mean.

Required vs Optional Params

Parameters without defaults are required. Required parameters must come before parameters with defaults:

1def process_file(filename, encoding="utf-8", verbose=False):
2 result = "Processing: " + filename
3 result += " (encoding=" + encoding + ")"
4 if verbose:
5 result += " [VERBOSE MODE]"
6 return result
7
8# filename is required, others are optional
9print(process_file("data.csv"))
10print(process_file("report.json", "ascii"))
11print(process_file("log.txt", verbose=True))
>>>Output
Processing: data.csv (encoding=utf-8)
Processing: report.json (encoding=ascii)
Processing: log.txt (encoding=utf-8) [VERBOSE MODE]
Correct Order
  • def f(required, optional=val)
  • Required args first
  • Defaults at the end
  • Clear API contract
Invalid Order
  • def f(optional=val, required)
  • Syntax error!
  • Python rejects this
  • No way to skip defaults

Mutable Default Pitfall

Never use mutable objects (lists, dicts) as default values. They are created once when the function is defined, not on each call:

1# WRONG: Mutable default (list)
2def add_item_bad(item, items=[]):
3 items.append(item)
4 return items
5
6# Each call shares the SAME list!
7print(add_item_bad("a"))
8print(add_item_bad("b"))
9print(add_item_bad("c"))
>>>Output
['a']
['a', 'b']
['a', 'b', 'c']
The list accumulates across calls because all calls share the same default list object. This is one of Python's most common gotchas.
1# CORRECT: Use None as default
2def add_item_good(item, items=None):
3 if items is None:
4 items = []
5 items.append(item)
6 return items
7
8# Each call gets a fresh list
9print(add_item_good("a"))
10print(add_item_good("b"))
11print(add_item_good("c"))
12
13# Can still pass existing list
14existing = ["x", "y"]
15print(add_item_good("z", existing))
>>>Output
['a']
['b']
['c']
['x', 'y', 'z']
Do
  • Use None as default, create mutable object inside
  • Use immutable defaults: strings, numbers, tuples
  • Test functions with repeated calls to catch shared state
Don't
  • Use [] or {} as default parameter values
  • Assume each call gets a fresh default list or dict
  • Ignore this in interviews - it is a classic gotcha

This pattern extends to any mutable type: lists, dictionaries, sets, and even custom objects. Safe defaults are immutable values like None, integers, floats, strings, tuples, and booleans. Dangerous defaults are mutable types like lists, dicts, sets, and custom objects. The None sentinel pattern is so common that experienced Python developers recognize it immediately.

Explore what happens when you use a mutable default versus the safe None pattern. Toggle the default value below and run both scenarios.

Fill in the Blank

> A function adds an item to a list using a default parameter. Pick None or [] as the default and see whether items accumulate across calls or stay independent.

def add_item(item, items=):
    if items is None:
        items = []
    items.append(item)
    return items

print(add_item("a"))
print(add_item("b"))

The None sentinel pattern is the standard Python idiom for any mutable default. Using None signals clearly to readers that the function creates a fresh container each time it is called.

This bug is especially dangerous in data pipelines because functions that process batches may accumulate state between calls. A function that was correct for the first batch silently produces wrong results for every subsequent one.

Immutable defaults like integers, strings, booleans, and tuples are always safe. When you need a mutable default, always use None and initialize inside the function body.

Multiple Return Values

Daily Life
Interviews

Return and unpack multiple results

Python functions can return multiple values by returning a tuple. The caller can then unpack these values into separate variables. This pattern is cleaner than returning a dictionary or list when you have a fixed number of related values to compute and return together.
In data engineering, you often need to compute several related metrics from the same data in a single pass. Rather than calling separate functions (which would iterate over the data multiple times), you compute everything in one function and return all the results. This is more efficient and keeps related logic together.

Returning Multiple Values

Separate return values with commas. Python automatically wraps them in a tuple:

1def get_stats(numbers):
2 total = sum(numbers)
3 count = len(numbers)
4 average = total / count if count > 0 else 0
5 return total, count, average
6
7# Unpack into separate variables
8data = [10, 20, 30, 40, 50]
9total, count, avg = get_stats(data)
10
11print("Total:", total)
12print("Count:", count)
13print("Average:", avg)
>>>Output
Total: 150
Count: 5
Average: 30.0

The line return total, count, average creates and returns a tuple (150, 5, 30.0). The unpacking total, count, avg = get_stats(data) splits it back into variables.

Multi-Return Patterns

Multiple returns are common when processing data and computing related metrics:
1def analyze_text(text):
2 words = text.split()
3 word_count = len(words)
4 char_count = len(text)
5 avg_word_length = char_count / word_count if word_count > 0 else 0
6 return word_count, char_count, avg_word_length
7
8sample = "Python is a powerful programming language"
9words, chars, avg_len = analyze_text(sample)
10
11print("Words:", words)
12print("Characters:", chars)
13print("Avg word length:", round(avg_len, 1))
>>>Output
Words: 6
Characters: 42
Avg word length: 7.0

Ignoring Unwanted Values

Use underscore _ to ignore values you do not need:

1def get_user_info():
2 return "Alice", 28, "Seattle", "Engineer"
3
4# Only need name and city
5name, _, city, _ = get_user_info()
6print(name + " lives in " + city)
7
8# Only need the first value
9name, *_ = get_user_info()
10print("User:", name)
>>>Output
Alice lives in Seattle
User: Alice

The underscore is a convention meaning "I do not care about this value." The *_ syntax captures all remaining values into a throwaway list.

Keeping as Tuple

You can also receive all values as a single tuple if needed:
1def min_max(numbers):
2 return min(numbers), max(numbers)
3
4data = [3, 1, 4, 1, 5, 9, 2, 6]
5
6# Keep as tuple
7result = min_max(data)
8print("Result tuple:", result)
9print("Min:", result[0])
10print("Max:", result[1])
11
12# Or unpack immediately
13low, high = min_max(data)
14print("Range:", low, "to", high)
>>>Output
Result tuple: (1, 9)
Min: 1
Max: 9
Range: 1 to 9
TIP
When a function returns more than 3-4 values, consider using a named tuple or dataclass instead. It makes the code more readable and self-documenting.

Extended Unpacking

Python 3 introduced extended unpacking with * to capture multiple values into a list. This is useful when functions return variable-length results:

1def get_scores():
2 return 95, 87, 92, 78, 88, 91
3
4# Get first, last, and middle
5first, *middle, last = get_scores()
6print("First:", first)
7print("Middle:", middle)
8print("Last:", last)
9
10# Get first two and rest
11top1, top2, *others = get_scores()
12print()
13print("Top two:", top1, top2)
14print("Others:", others)
>>>Output
First: 95
Middle: [87, 92, 78, 88]
Last: 91
 
Top two: 95 87
Others: [92, 78, 88, 91]
The starred variable captures all values not assigned to other variables. This is particularly useful when parsing data where you know the structure of the first and last elements but the middle can vary in length.

Data Validation Example

A common pattern is returning both a result and a status indicator:
1def validate_record(record):
2 errors = []
3
4 if "name" not in record:
5 errors.append("Missing name")
6 if "email" not in record:
7 errors.append("Missing email")
8 if "age" in record and record["age"] < 0:
9 errors.append("Invalid age")
10
11 is_valid = len(errors) == 0
12 return is_valid, errors
13
14# Test with valid record
15record1 = {"name": "Alice", "email": "a@b.com", "age": 25}
16valid, errs = validate_record(record1)
17print("Valid:", valid, "Errors:", errs)
18
19# Test with invalid record
20record2 = {"email": "test@test.com", "age": -5}
21valid, errs = validate_record(record2)
22print("Valid:", valid, "Errors:", errs)
>>>Output
Valid: True Errors: []
Valid: False Errors: ['Missing name', 'Invalid age']
This validate-and-return pattern is extremely common in data pipelines. Rather than raising exceptions for invalid data (which disrupts batch processing), you return validation status and error details. The caller can then decide how to handle invalid records: skip them, log them, or fix them.
Related statistics
Related statistics
Compute total, count, and average in one pass over the data.
Status plus result
Status plus result
Return a success flag alongside the computed data for error handling.
String splitting
String splitting
Separate text into parts like (before, separator, after) tuples.
Validation results
Validation results
Return is_valid boolean and a list of error messages together.
Data plus metadata
Data plus metadata
Return transformed records along with metadata like row counts.
Python Quiz

> A function returns the smallest and largest values from a list. Pick the built-in that finds the minimum and the one that finds the maximum.

def min_max(nums):
    return ___(nums), ___(nums)

lo, hi = min_max([4, 1, 7, 2])
print(lo)
print(hi)
sorted
min
len
max
sum
Returning multiple values is a Python idiom that keeps related results together without requiring a class or a dictionary. The caller can unpack them in a single assignment, making the code concise and readable.
Tuple unpacking on the left side of an assignment is one of Python's most expressive features. It communicates the expected structure of the return value directly at the call site.
When a function must return a large number of related values, consider a named tuple or a dataclass. These provide both the convenience of tuple unpacking and the clarity of attribute access by name.

Local vs Global Scope

Daily Life
Interviews

Control variable visibility in code

Scope determines where a variable is visible and accessible. Python has two main scopes: local (inside a function) and global (module level). Understanding scope prevents bugs where variables unexpectedly share or shadow each other. Scope bugs are especially tricky because the code often looks correct but behaves differently than expected.
Every variable in Python lives in a specific scope. When you reference a variable name, Python searches scopes in a specific order to find it. Understanding this lookup order is crucial for predicting how your code will behave, especially in larger programs with many functions.

Local Scope

Variables created inside a function exist only within that function. They are created when the function runs and destroyed when it returns:
1def calculate():
2 x = 10
3 y = 20
4 result = x + y
5 print("Inside function:", result)
6 return result
7
8answer = calculate()
9print("Returned:", answer)
10
11# x is not accessible outside
12# print(x) # NameError
>>>Output
Inside function: 30
Returned: 30

The variables x, y, and result only exist inside calculate(). Once the function returns, they are gone. This isolation is a feature: it prevents functions from accidentally interfering with each other.

Local scope is created fresh for each function call. If you call calculate() twice, each call gets its own independent x, y, and result variables. This means recursive functions work correctly, and concurrent calls do not interfere with each other. The isolation makes functions predictable and testable.

TIP
Local variables are your friends. They make functions self-contained and predictable. You can understand what a function does by looking at just that function, without tracking global state.

Global Scope

Variables defined outside all functions are global. They can be read from anywhere in the module:
1# Global variable
2CONFIG_TIMEOUT = 30
3
4def get_timeout():
5 # Can READ global variables
6 return CONFIG_TIMEOUT
7
8def show_config():
9 # Can also READ globals
10 print("Timeout setting:", CONFIG_TIMEOUT)
11
12print("Direct access:", CONFIG_TIMEOUT)
13print("Via function:", get_timeout())
14show_config()
>>>Output
Direct access: 30
Via function: 30
Timeout setting: 30
Global variables are useful for configuration constants, database connections, and other values that should be shared across the entire module. However, reading globals is very different from modifying them. Python allows reading by default but requires explicit declaration to modify.

Variable Shadowing

When a local variable has the same name as a global, the local shadows (hides) the global inside that function:
1value = "global"
2
3def show_shadowing():
4 # Creates new LOCAL variable, not modifying global
5 value = "local"
6 print("Inside function:", value)
7
8def show_global():
9 print("Reading global:", value)
10
11show_shadowing()
12show_global()
13print("Outside:", value)
>>>Output
Inside function: local
Reading global: global
Outside: global

The assignment value = "local" creates a new local variable that hides the global one. The global value is never modified. This behavior protects global state from accidental modification.

Shadowing can cause confusion when you expect to read a global but accidentally create a local with the same name. Python determines scope at compile time, not runtime. If a variable is assigned anywhere in a function, Python treats it as local throughout that entire function, even before the assignment.

1status = "ready"
2
3def check_status():
4 # This FAILS - Python sees 'status =' later and treats status as local
5 # print(status) # UnboundLocalError
6 # This assignment makes 'status' local to entire function
7 status = "running"
8 return status
9
10# The global is never touched
11print("Global:", status)
12print("Function:", check_status())
13print("Global still:", status)
>>>Output
Global: ready
Function: running
Global still: ready

The global Keyword

To modify a global variable from inside a function, declare it with the global keyword:

1counter = 0
2
3def increment():
4 global counter
5 counter = counter + 1
6 print("Counter inside:", counter)
7
8print("Before:", counter)
9increment()
10increment()
11increment()
12print("After:", counter)
>>>Output
Before: 0
Counter inside: 1
Counter inside: 2
Counter inside: 3
After: 3
Do
  • Pass values as parameters and return results
  • Use globals only for true constants like CONFIG
  • Keep functions pure and self-contained when possible
Don't
  • Modify global variables inside functions
  • Rely on global state that changes between calls
  • Use the global keyword when you can use parameters

Scope in Nested Functions

When functions are nested, inner functions can read from enclosing scopes:
1def outer():
2 message = "Hello from outer"
3
4 def inner():
5 print(message)
6
7 inner()
8 return inner
9
10outer()
>>>Output
Hello from outer

To modify a variable from an enclosing (non-global) scope, use the nonlocal keyword. This is useful for closures and factories.

global
  • Module-level variables
  • Visible everywhere
  • Use sparingly
  • For true constants only
nonlocal
  • Enclosing function scope
  • For nested functions
  • Enables closures
  • More controlled than global
Python follows the LEGB rule when looking up variable names. It searches four scopes in this exact order, stopping as soon as it finds a match.
01
Local
Variables defined inside the current function body.
02
Enclosing
Variables in any enclosing (outer) function scope.
03
Global
Variables defined at the module (file) level.
04
Built-in
Names pre-defined by Python like print, len, and range.
Python Quiz

> A function mutates a list by adding an element, then returns its new size. Pick the method that adds to the end, and the built-in that measures the length after the change.

def add_to(items, value):
    items.___(value)
    return ___(items)

data = [10, 20]
result = add_to(data, 30)
print(result)
print(len(data))
sum
append
len
extend
insert
Scope determines which variables a function can read and modify. Local variables are created fresh each call and disappear when the function returns. They cannot accidentally overwrite values in other parts of the program.
Mutable objects like lists are passed by reference. When you mutate a list inside a function, the changes are visible outside because both the function and the caller hold a reference to the same object.
The LEGB lookup order means Python resolves names starting from the innermost scope outward. Understanding this order lets you predict exactly which variable a name refers to, even when the same name appears in multiple scopes.

*args for Variadics

Daily Life
Interviews

Accept any number of arguments

The *args syntax lets a function accept any number of positional arguments. The arguments are collected into a tuple. This is invaluable when you do not know in advance how many values will be passed.

Think of a logging function that needs to print any number of messages, or a calculation function that sums any count of numbers, or a path builder that joins any number of directory components. Without *args, you would need to accept a list and require callers to wrap their arguments in brackets. With *args, the function call looks natural and clean.

Basic *args Usage

The asterisk * before a parameter name collects extra positional arguments:

1def sum_all(*numbers):
2 print("Received:", numbers)
3 total = 0
4 for num in numbers:
5 total = total + num
6 return total
7
8print("Sum:", sum_all(1, 2))
9print("Sum:", sum_all(1, 2, 3, 4, 5))
10print("Sum:", sum_all(10))
11print("Sum:", sum_all())
>>>Output
Received: (1, 2)
Sum: 3
Received: (1, 2, 3, 4, 5)
Sum: 15
Received: (10,)
Sum: 10
Received: ()
Sum: 0

The name args is a convention. You could use *values or *items. The asterisk is what matters.

Mixing Regular and *args

You can have regular parameters before *args. Regular parameters are filled first, and *args captures the rest:

1def log_message(level, *messages):
2 prefix = "[" + level.upper() + "]"
3 for msg in messages:
4 print(prefix, msg)
5
6log_message("info", "Server started")
7print()
8
9log_message("error", "Connection failed", "Retrying...", "Timeout reached")
10print()
11
12log_message("debug", "x = 10", "y = 20", "sum = 30")
>>>Output
[INFO] Server started
 
[ERROR] Connection failed
[ERROR] Retrying...
[ERROR] Timeout reached
 
[DEBUG] x = 10
[DEBUG] y = 20
[DEBUG] sum = 30

Unpacking with *

The * operator also works in reverse. You can unpack a list or tuple when calling a function:

1def add_three(a, b, c):
2 return a + b + c
3
4# Without unpacking
5print(add_three(1, 2, 3))
6
7# With unpacking
8numbers = [10, 20, 30]
9print(add_three(*numbers))
10
11# Also works with tuples
12coords = (5, 10, 15)
13print(add_three(*coords))
>>>Output
6
60
30

The *numbers unpacks the list [10, 20, 30] into three separate arguments, as if you had written add_three(10, 20, 30).

Flexible Logging Example

Many utility functions in data pipelines use *args for flexibility:

1def build_path(*parts):
2 """Join path components with /"""
3 return "/".join(parts)
4
5# Build file paths with any number of components
6print(build_path("data"))
7print(build_path("data", "raw"))
8print(build_path("data", "raw", "2024", "01", "sales.csv"))
9
10# Useful for S3 paths, URLs, etc.
11bucket = "my-bucket"
12prefix = "etl-output"
13date = "2024-01-15"
14filename = "report.parquet"
15print(build_path(bucket, prefix, date, filename))
>>>Output
data
data/raw
data/raw/2024/01/sales.csv
my-bucket/etl-output/2024-01-15/report.parquet
sum_alllogjoin_pathformatchain
sum_all
Aggregate
Sum any count of values
log
Logging
Log multiple items at once
join_path
Path building
Build file paths on the fly
format
String format
Inject values into strings
chain
Combine lists
Merge sequences together

The *args pattern is used extensively in Python's standard library. Functions like print() accept any number of arguments to display. The max() and min() functions accept either a single iterable or multiple positional arguments. Understanding *args helps you use these built-in functions more effectively.

**kwargs for Keywords

Daily Life
Interviews

Handle named options as dictionaries

The **kwargs syntax captures keyword arguments into a dictionary. This allows functions to accept any named parameters, making them highly configurable without defining every possible option upfront.

Consider a function that connects to a database. Different database systems need different options: PostgreSQL might need an SSL certificate, MySQL might need a character set, and Redis might need a connection pool size. Rather than defining every possible parameter, you accept **kwargs and let callers pass whatever options their database needs.

This pattern is central to Python web frameworks. When you define a Django model or a Flask route, you pass keyword arguments that configure behavior. The framework collects these into a dictionary and processes them. Understanding **kwargs helps you both use and build such APIs.

Basic **kwargs Usage

Double asterisk ** collects keyword arguments into a dictionary:

1def print_info(**kwargs):
2 print("Received:", kwargs)
3 for key, value in kwargs.items():
4 print(" " + key + " = " + str(value))
5
6print_info(name="Alice", age=25)
7print()
8print_info(city="Seattle", country="USA", zip_code="98101")
9print()
10print_info()
>>>Output
Received: {'name': 'Alice', 'age': 25}
name = Alice
age = 25
 
Received: {'city': 'Seattle', 'country': 'USA', 'zip_code': '98101'}
city = Seattle
country = USA
zip_code = 98101
 
Received: {}

Like args, the name kwargs is just convention. You could use **options or **config.

Using *args and **kwargs

You can use all parameter types together. They must appear in this order: regular, *args, keyword-only, **kwargs:

1def flexible_func(required, *args, **kwargs):
2 print("Required:", required)
3 print("Extra args:", args)
4 print("Keyword args:", kwargs)
5
6flexible_func("first")
7print()
8
9flexible_func("first", "extra1", "extra2")
10print()
11
12flexible_func("first", "extra1", name="Alice", debug=True)
>>>Output
Required: first
Extra args: ()
Keyword args: {}
 
Required: first
Extra args: ('extra1', 'extra2')
Keyword args: {}
 
Required: first
Extra args: ('extra1',)
Keyword args: {'name': 'Alice', 'debug': True}

Unpacking Dictionaries

Use ** to unpack a dictionary into keyword arguments when calling a function:

1def create_user(name, email, role="user"):
2 return {"name": name, "email": email, "role": role}
3
4# Without unpacking
5user1 = create_user("Alice", "alice@test.com", "admin")
6print(user1)
7
8# With dictionary unpacking
9config = {"name": "Bob", "email": "bob@test.com"}
10user2 = create_user(**config)
11print(user2)
12
13# Override some values
14user3 = create_user(**config, role="moderator")
15print(user3)
>>>Output
{'name': 'Alice', 'email': 'alice@test.com', 'role': 'admin'}
{'name': 'Bob', 'email': 'bob@test.com', 'role': 'user'}
{'name': 'Bob', 'email': 'bob@test.com', 'role': 'moderator'}

Configuration Example

The **kwargs pattern is everywhere in data engineering for configuration and options:

1def connect_database(host, port, **options):
2 config = {
3 "host": host,
4 "port": port,
5 "timeout": options.get("timeout", 30),
6 "ssl": options.get("ssl", True),
7 "retries": options.get("retries", 3),
8 }
9 return config
10
11# Minimal call
12print(connect_database("localhost", 5432))
13
14# With some options
15print(connect_database("prod.db.com", 5432, timeout=60, ssl=True))
16
17# From a config dict
18db_settings = {"timeout": 120, "retries": 5, "pool_size": 10}
19print(connect_database("prod.db.com", 5432, **db_settings))
>>>Output
{'host': 'localhost', 'port': 5432, 'timeout': 30, 'ssl': True, 'retries': 3}
{'host': 'prod.db.com', 'port': 5432, 'timeout': 60, 'ssl': True, 'retries': 3}
{'host': 'prod.db.com', 'port': 5432, 'timeout': 120, 'ssl': True, 'retries': 5}

Note how options.get("key", default) provides fallback values for missing keys. This is the standard pattern for handling optional configuration.

Forwarding Arguments

A powerful pattern is using *args and **kwargs to forward all arguments to another function:

1def log_call(func):
2 def wrapper(*args, **kwargs):
3 print("Calling:", func.__name__)
4 print(" args:", args)
5 print(" kwargs:", kwargs)
6 result = func(*args, **kwargs)
7 print(" result:", result)
8 return result
9 return wrapper
10
11def add(a, b):
12 return a + b
13
14logged_add = log_call(add)
15logged_add(3, 5)
16print()
17logged_add(a=10, b=20)
>>>Output
Calling: add
args: (3, 5)
kwargs: {}
result: 8
 
Calling: add
args: ()
kwargs: {'a': 10, 'b': 20}
result: 30
TIP
The *args/**kwargs forwarding pattern is the foundation of Python decorators. You will see this constantly in frameworks like Flask, Django, and data tools like pandas.
This forwarding pattern is how decorators preserve function signatures. The wrapper function accepts any arguments and passes them through unchanged. The decorated function receives exactly what was passed to the wrapper, regardless of its parameter structure. This makes decorators universally applicable.

Merging Dicts with **

The ** operator can merge dictionaries in function calls and dictionary literals:

1# Default configuration
2defaults = {"timeout": 30, "retries": 3, "ssl": True}
3
4# User overrides
5user_config = {"timeout": 60, "debug": True}
6
7# Merge: user_config values override defaults
8final_config = {**defaults, **user_config}
9print("Merged:", final_config)
10
11# Later dicts override earlier ones
12print("timeout:", final_config["timeout"])
>>>Output
Merged: {'timeout': 60, 'retries': 3, 'ssl': True, 'debug': True}
timeout: 60
This dictionary merging technique is common in configuration management. You start with default values, merge environment-specific overrides, then merge user-specified values. Each layer can override keys from previous layers while preserving keys that were not overridden.
*args
  • Collects positional args
  • Results in a tuple
  • Order matters
  • Good for variable-length data
**kwargs
  • Collects keyword args
  • Results in a dictionary
  • Named parameters
  • Good for configuration

Common Mistakes

These are the most frequent errors when working with intermediate function features. Each of these mistakes appears regularly in interview questions and code reviews. Understanding why they are wrong helps you avoid them in your own code and spot them in others' code.

Mistake 1: Mutable Defaults

1# WRONG - list shared between calls
2def append_bad(item, items=[]):
3 items.append(item)
4 return items
5
6# Default list is shared between calls!
7print(append_bad(1))
8print(append_bad(2))
9
10def append_good(item, items=None):
11 if items is None:
12 items = []
13 items.append(item)
14 return items
15
16print(append_good(1))
17print(append_good(2))
>>>Output
[1]
[1, 2]
[1]
[2]

This is the most common intermediate Python pitfall. The list default is created once when Python parses the function definition. Every call that uses the default shares that same list object. The fix is always the same: use None as the default and create a fresh mutable object inside the function.

Mistake 2: Implicit Global

1counter = 0
2
3# WRONG - creates local, not global
4def increment_wrong():
5 # UnboundLocalError: assignment makes it local
6 counter = counter + 1
7 return counter
8
9# CORRECT - declare global explicitly
10def increment_right():
11 global counter
12 counter = counter + 1
13 return counter
14
15# Even better - avoid globals entirely
16def increment_best(current):
17 return current + 1
18
19value = 0
20value = increment_best(value)
21print("Best approach:", value)
>>>Output
Best approach: 1

The UnboundLocalError is confusing because the error message says the variable is referenced before assignment, but you might think you are reading a global. The key insight is that Python determines scope at compile time based on assignments anywhere in the function, not at runtime based on execution order.

Mistake 3: Arg Order

1# WRONG order - defaults before non-defaults
2# def bad_func(a=1, b): # SyntaxError!
3
4# CORRECT order: required, *args, defaults, **kwargs
5def good_func(required, *args, optional=10, **kwargs):
6 print("required:", required)
7 print("args:", args)
8 print("optional:", optional)
9 print("kwargs:", kwargs)
10
11good_func("first", "extra", optional=20, debug=True)
>>>Output
required: first
args: ('extra',)
optional: 20
kwargs: {'debug': True}

Remember the order: required positional, then *args, then keyword-only with defaults, then **kwargs. This order is enforced by Python because it is the only way to unambiguously assign arguments to parameters.

Mistake 4: Unpack First

1def greet(name, greeting):
2 return greeting + ", " + name
3
4args = ["Alice", "Hello"]
5
6# WRONG - passes list as single argument
7# greet(args) # TypeError
8
9# CORRECT - unpack list into arguments
10print(greet(*args))
11
12config = {"name": "Bob", "greeting": "Hi"}
13# WRONG - passes dict as single argument
14# greet(config) # TypeError
15
16print(greet(**config))
>>>Output
Hello, Alice
Hi, Bob

This mistake is especially common when reading configuration from files or environment variables. You load a dictionary of settings and need to pass them to a function. Without the ** unpack operator, Python passes the entire dictionary as a single argument rather than expanding it into keyword arguments.

TIP
When you see a TypeError about missing arguments but you think you passed them, check whether you forgot to unpack. The error message tells you how many arguments were received versus expected.
This function tries to return multiple values but has a bug. Can you fix it by removing the extra tile?
Debug Challenge

> This function tries to return both the min and max of a list, but it uses two return keywords instead of one. Python only needs a single return with comma-separated values.

SyntaxError: only one return statement is needed to return multiple values.

The *args and **kwargs syntax enables genuinely flexible interfaces. Wrapper functions, decorators, and logging utilities all rely on forwarding arbitrary arguments without knowing their names or count in advance.

When combining *args and **kwargs with regular parameters, the order matters: required positional parameters come first, then *args, then keyword-only parameters with defaults, then **kwargs. Python enforces this order because it is the only unambiguous assignment.

The unpacking operators work in both directions. Just as * collects arguments in a function definition, it also unpacks a list or tuple at a call site. Similarly, ** collects keyword arguments in a definition and unpacks a dictionary at a call site.

PUTTING IT ALL TOGETHER

> You are a data engineer at Amplitude building a flexible transformation library that non-technical analysts can configure by passing custom rules without modifying the underlying pipeline code.

Default parameters give each transformation a sensible fallback so analysts get correct output without specifying every option.
return values hand back both the transformed record and a status flag so callers handle success and errors in one call.
Local vs global scope ensures analyst-supplied config variables never accidentally overwrite shared pipeline state between runs.
*args and **kwargs let analysts pass any number of filter rules or keyword overrides into a single generalized transform function.
KEY TAKEAWAYS
Default parameters use param=value syntax; required params must come first
Never use mutable defaults (lists, dicts); use None and create inside the function
Return multiple values with return a, b, c; unpack with x, y, z = func()
Local variables are isolated to their function; use global sparingly to modify globals
*args collects extra positional arguments into a tuple
**kwargs collects extra keyword arguments into a dictionary
Use *list to unpack lists and **dict to unpack dicts when calling functions
Parameter order: required, *args, keyword-only with defaults, **kwargs
The *args, **kwargs pattern enables powerful argument forwarding
Python scope follows LEGB: Local, Enclosing, Global, Built-in

Flexible, production-ready functions

Category
Python
Difficulty
intermediate
Duration
47 minutes
Challenges
0 hands-on challenges

Topics covered: Default Parameters, Multiple Return Values, Local vs Global Scope, *args for Variadics, **kwargs for Keywords

Lesson Sections

  1. Default Parameters (concepts: pyFuncDefault)

    Basic Default Values Default parameters are evaluated left to right at function definition time, not at call time. This distinction becomes important when we discuss the mutable default pitfall later in this section. For now, understand that each call either uses your provided value or falls back to the pre-defined default. Multiple Default Parameters Functions can have multiple default parameters. This is common in data processing functions where you want sensible defaults: Notice how the funct

  2. Multiple Return Values

    Python functions can return multiple values by returning a tuple. The caller can then unpack these values into separate variables. This pattern is cleaner than returning a dictionary or list when you have a fixed number of related values to compute and return together. In data engineering, you often need to compute several related metrics from the same data in a single pass. Rather than calling separate functions (which would iterate over the data multiple times), you compute everything in one f

  3. Local vs Global Scope (concepts: pyFuncScope)

    Scope determines where a variable is visible and accessible. Python has two main scopes: local (inside a function) and global (module level). Understanding scope prevents bugs where variables unexpectedly share or shadow each other. Scope bugs are especially tricky because the code often looks correct but behaves differently than expected. Every variable in Python lives in a specific scope. When you reference a variable name, Python searches scopes in a specific order to find it. Understanding t

  4. *args for Variadics (concepts: pyArgs)

    Basic *args Usage Mixing Regular and *args Unpacking with * Flexible Logging Example

  5. **kwargs for Keywords

    Basic **kwargs Usage Using *args and **kwargs Unpacking Dictionaries Configuration Example Forwarding Arguments This forwarding pattern is how decorators preserve function signatures. The wrapper function accepts any arguments and passes them through unchanged. The decorated function receives exactly what was passed to the wrapper, regardless of its parameter structure. This makes decorators universally applicable. Merging Dicts with ** This dictionary merging technique is common in configuratio