Python Practice Problems
388 free Python practice problems with worked solutions, scored in a Python 3.11 sandbox against real test cases. Sourced from verified data engineering interview reports: parsing, dedup, sessionization, pandas patterns that show up on take-homes. 8 topic catalogs, 4 difficulty bands.
388 Python practice problems with worked solutions, scored in a Python 3.11 sandbox against real test cases. Pulled from data engineering interview reports: parsing messy files, deduping records, sessionizing events, pandas patterns that show up on take-homes.
Know the patterns before the interviewer asks them.
Practice problems by topic
8 topic catalogs covering the patterns that show up in real interviews. Click any topic to open its filtered problem list. Bars show problem count, normalized against the largest catalog.
- 01Data transformation140 problemsEasy → Hard
Group, aggregate, reshape, normalize. The single largest bucket in the bank because it's the largest bucket in the job.
- 02Dict and set operations60 problemsEasy → Medium
Merging, filtering, grouping, walking a nested structure. Reach for the standard library before pandas.
- 03ETL logic and pipelines50 problemsMedium → Hard
Validators, parsers, retry handlers, schema enforcers. The kind of code that doesn't crash on the one bad row.
- 04PySpark and Spark45 problemsMedium → Hard
DataFrame operations, group-by aggregates, broadcast joins, skew handling. Expect at least one if you're interviewing at Databricks, Netflix, or Uber.
- 05File parsing and I/O45 problemsEasy → Medium
CSV, JSON, gzipped logs. Line-by-line vs load-into-memory. The malformed row in a file of ten million.
- 06String and regex35 problemsEasy → Medium
Most parsing problems are fundamentally string problems. Apache logs, snake_case to camelCase, key=value pairs with embedded quotes.
- 07Error handling and debugging30 problemsMedium
Try/except patterns, custom exceptions with chained context, partial-success paths. Production-ready defensive code.
- 08Generators and memory28 problemsMedium → Hard
Streaming reads, chunked iteration, merge-sorted iterators. Once the dataset stops fitting in memory, generators stop being optional.
What a worked answer looks like
# Group order rows by customer_id, return total spend and order count.
# No pandas. Standard library only.
from collections import defaultdict
def group_orders(orders):
agg = defaultdict(lambda: {"total": 0, "count": 0})
for row in orders:
cid = row["customer_id"]
agg[cid]["total"] += row["amount"]
agg[cid]["count"] += 1
return dict(agg)Sounds trivial, becomes the question most candidates underestimate and then write twenty lines of nested conditionals for. defaultdict skips the 'if key not in dict' dance; the lambda creates a fresh accumulator the first time a new customer_id shows up. The standard-library answer to a SQL GROUP BY with SUM and COUNT. If you can write this from a blank file in under three minutes, the dictionary section of a Python round is solved.
DE Python rounds vs SWE Python rounds
If you've been prepping with LeetCode, you've been practicing for a different exam. The vocabulary overlaps; the questions and rubric don't.
| What's tested | Software engineering round | Data engineering round |
|---|---|---|
| Typical prompt | Reverse a linked list, traverse a binary tree, dynamic programming on intervals | Parse this JSON, dedupe this stream, sessionize these events |
| Libraries expected | collections, heapq, functools, asyncio | csv, json, collections, itertools, occasionally pandas |
| What 'right' looks like | Optimal time and space complexity, clean recursion or DP table | Handles malformed input, streams files instead of loading them, names the failure mode |
| Pandas | Almost never expected | Fine if the prompt is genuinely tabular and you ask first; reaching for it on a five-row dedup reads as overkill |
| Failure mode that hurts most | Couldn't find the optimal recursion | Code crashes on the one malformed row in the test fixture |
2 modes, 2 stages of prep
Both run the same Python 3.11 sandbox. The difference is how much structure the prompt gives you.
Common questions
Are these Python practice problems really free?+
What Python topics come up most in data engineering interviews?+
Do I need pandas to practice?+
What about PySpark?+
How many problems should I solve before a real interview?+
What if I'm targeting a non-DE role?+
Open the bank, solve one
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition