Python Practice Problems

388 free Python practice problems with worked solutions, scored in a Python 3.11 sandbox against real test cases. Sourced from verified data engineering interview reports: parsing, dedup, sessionization, pandas patterns that show up on take-homes. 8 topic catalogs, 4 difficulty bands.

388 Python practice problems with worked solutions, scored in a Python 3.11 sandbox against real test cases. Pulled from data engineering interview reports: parsing messy files, deduping records, sessionizing events, pandas patterns that show up on take-homes.

Prepare for the interview
01 / Open invite
02min.

Know the patterns before the interviewer asks them.

a Python query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1def sessionize(events):
2 sessions = []
3 for e in events:
4 if gap_minutes(e) > 30:
5
Execute your solution0.4s avg.
ShopifyInterview question
Solve a problem

Practice problems by topic

8 topic catalogs covering the patterns that show up in real interviews. Click any topic to open its filtered problem list. Bars show problem count, normalized against the largest catalog.

  1. 01Data transformation
    140 problemsEasy → Hard

    Group, aggregate, reshape, normalize. The single largest bucket in the bank because it's the largest bucket in the job.

  2. 02Dict and set operations
    60 problemsEasy → Medium

    Merging, filtering, grouping, walking a nested structure. Reach for the standard library before pandas.

  3. 03ETL logic and pipelines
    50 problemsMedium → Hard

    Validators, parsers, retry handlers, schema enforcers. The kind of code that doesn't crash on the one bad row.

  4. 04PySpark and Spark
    45 problemsMedium → Hard

    DataFrame operations, group-by aggregates, broadcast joins, skew handling. Expect at least one if you're interviewing at Databricks, Netflix, or Uber.

  5. 05File parsing and I/O
    45 problemsEasy → Medium

    CSV, JSON, gzipped logs. Line-by-line vs load-into-memory. The malformed row in a file of ten million.

  6. 06String and regex
    35 problemsEasy → Medium

    Most parsing problems are fundamentally string problems. Apache logs, snake_case to camelCase, key=value pairs with embedded quotes.

  7. 07Error handling and debugging
    30 problemsMedium

    Try/except patterns, custom exceptions with chained context, partial-success paths. Production-ready defensive code.

  8. 08Generators and memory
    28 problemsMedium → Hard

    Streaming reads, chunked iteration, merge-sorted iterators. Once the dataset stops fitting in memory, generators stop being optional.

What a worked answer looks like

# Group order rows by customer_id, return total spend and order count.
# No pandas. Standard library only.

from collections import defaultdict

def group_orders(orders):
    agg = defaultdict(lambda: {"total": 0, "count": 0})
    for row in orders:
        cid = row["customer_id"]
        agg[cid]["total"] += row["amount"]
        agg[cid]["count"] += 1
    return dict(agg)

Sounds trivial, becomes the question most candidates underestimate and then write twenty lines of nested conditionals for. defaultdict skips the 'if key not in dict' dance; the lambda creates a fresh accumulator the first time a new customer_id shows up. The standard-library answer to a SQL GROUP BY with SUM and COUNT. If you can write this from a blank file in under three minutes, the dictionary section of a Python round is solved.

DE Python rounds vs SWE Python rounds

If you've been prepping with LeetCode, you've been practicing for a different exam. The vocabulary overlaps; the questions and rubric don't.

What's testedSoftware engineering roundData engineering round
Typical promptReverse a linked list, traverse a binary tree, dynamic programming on intervalsParse this JSON, dedupe this stream, sessionize these events
Libraries expectedcollections, heapq, functools, asynciocsv, json, collections, itertools, occasionally pandas
What 'right' looks likeOptimal time and space complexity, clean recursion or DP tableHandles malformed input, streams files instead of loading them, names the failure mode
PandasAlmost never expectedFine if the prompt is genuinely tabular and you ask first; reaching for it on a five-row dedup reads as overkill
Failure mode that hurts mostCouldn't find the optimal recursionCode crashes on the one malformed row in the test fixture

2 modes, 2 stages of prep

Both run the same Python 3.11 sandbox. The difference is how much structure the prompt gives you.

Problem mode
Self-paced with a clear prompt, partial test visibility, and instant evaluator feedback.
Use this while you're learning a new pattern. About 30 problems is when most candidates stop being surprised by the test cases.
Interview mode
Timed, deliberately vague prompt, AI interviewer asking follow-ups while you write and again after you submit.
Verdict at the end names the specific exchanges that decided it. Use this in the last week or two before a real onsite.

Common questions

Are these Python practice problems really free?+
Yes. The catalog isn't paywalled and never has been. Sign-in is optional and only saves your progress across devices.
What Python topics come up most in data engineering interviews?+
Data transformation is the largest bucket (around a third of questions), dict and set operations next, then file parsing, then ETL flow control. Generators and memory efficiency come up at senior levels. Tree traversals and dynamic programming almost never show up; something involving a messy file almost always does.
Do I need pandas to practice?+
Usually not, and reaching for it without asking can hurt you in interviews. Most rounds want to see you handle the problem with the standard library: dict, list, set, csv, json, itertools, collections. Some shops are fine with pandas if you ask first. Some roles, especially ML-adjacent ones, expect it. The job description is the tell.
What about PySpark?+
Practice it if you're targeting Databricks, Netflix, Uber, Airbnb, or any team running Spark in production. DataFrame operations, broadcast joins, partitioning, and the data-skew question are the patterns. If you're targeting a Snowflake-on-dbt shop, PySpark almost never comes up.
How many problems should I solve before a real interview?+
30 to 50 if you're solving them properly. The 4 buckets that pay off most are data transformation, dict operations, file parsing, and error handling. Those 4 together cover almost 70% of what gets asked in a real DE Python round.
What if I'm targeting a non-DE role?+
Most of these problems still apply to data scientist and analytics engineer interviews, which test similar Python patterns. The exception is PySpark, which is DE-specific. If you're applying for software engineering roles, the problem bank here won't match the round shape and LeetCode is the right tool instead.
02 / Why practice

Open the bank, solve one

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Where to go next