Python Coding Practice for Data Engineering Interviews
Python for data engineering looks different from Python for software engineering. A DE interview wants to see you parse a malformed CSV without crashing, deduplicate an event stream by composite key, validate records with field-level errors, walk a nested JSON. The 388 problems in this catalog run in a real Python 3.11 sandbox in the browser; each ships with public and hidden test cases.
Python for data engineering looks different from Python for software engineering. A DE interview wants to see you parse a malformed CSV without crashing, deduplicate an event stream by composite key, validate records with field-level errors, walk a nested JSON. The 388 problems in this catalog run in a real Python 3.11 sandbox in the browser; each ships with public and hidden test cases.
Know the patterns before the interviewer asks them.
The shape of DE Python vs. SWE Python
2 problems, similar surface area, very different in what they test.
def longest_substring_without_repeat(s: str) -> int:
seen = {}
start = best = 0
for i, ch in enumerate(s):
if ch in seen and seen[ch] >= start:
start = seen[ch] + 1
seen[ch] = i
best = max(best, i - start + 1)
return bestAlgorithm on a toy input. Pointer state, integer counting, complexity reasoning. Necessary for SWE; rarely shows up in DE rounds.
def dedup_events(events: list[dict]) -> list[dict]:
latest = {}
for ev in events:
key = (ev["user_id"], ev["event_type"])
cur = latest.get(key)
if (cur is None or
ev["event_time"] > cur["event_time"] or
(ev["event_time"] == cur["event_time"]
and ev["event_id"] > cur["event_id"])):
latest[key] = ev
return list(latest.values())Dict as keyed store, composite key, tiebreaker on ties, returns list-of-dicts shape. Same skill that lands in production code.
Catalog topic distribution
388 problems sorted by what they teach. Algorithm DSA is the smallest slice on purpose.
How the test runner shows up after submit
# Schema for the prompt:
# events: list[dict] with keys user_id, event_type, event_time, event_id
# returns: list[dict], 1 per (user_id, event_type), most recent
#
# Public tests (visible to you):
# test_empty_input
# test_single_event
# test_no_duplicates
# test_basic_dedup
#
# Hidden tests (revealed only after passing public):
# test_timestamp_ties_use_event_id
# test_late_arriving_event
# test_100k_events_under_1_second
# test_unicode_user_ids
# test_event_time_at_dst_boundary
# Submission output format:
submit @ 2026-05-26T16:42:18Z
test_empty_input PASS 0.4 ms
test_single_event PASS 0.5 ms
test_no_duplicates PASS 0.8 ms
test_basic_dedup PASS 1.2 ms
test_timestamp_ties... FAIL expected event_id 882, got 881
(your code returns the first event seen on a tie)
test_late_arriving_event PASS 2.1 ms
test_100k_events... FAIL exceeded 1000 ms (your impl is O(n^2))
test_unicode_user_ids PASS 0.9 ms
test_event_time_at_dst... PASS 1.4 ms
verdict: 7/9 pass. fix the tie tiebreaker and the quadratic loop. resubmit.Public tests visible upfront; hidden tests revealed after the public ones pass. Performance budgets are explicit.
Where to actually practice Python for DE interviews
Pricing reflects May 2026 public tiers. 'DE share' is a rough estimate of how much of each catalog matches the interview shapes the topic chart describes.
| Site | Catalog | Test runner | DE-shaped share | Performance tests | Free tier |
|---|---|---|---|---|---|
| DataDriven (this site) | 388 problems, all free | Real Python 3.11, 5-15 tests per problem | 100% DE-shaped | Yes, with time budgets | Yes, no signup |
| LeetCode | ~2400 problems, ~30% free | Real Python, fixed test cases | Maybe 5% DE-shaped | Yes | Easy + slice of Medium |
| HackerRank Python | ~125 problems | Real Python, fixed tests | Maybe 10% | Limited | Most free |
| PYnative | 630+ exercises | Self-check via solution | Maybe 15% (some pandas) | No | Free |
| Exercism Python | 146 exercises | Mentor review + tests | ~10% | No | Free |
Python coding practice FAQ
What kind of Python do data engineers actually write?+
Should I prep with LeetCode for a data engineer interview?+
Do I need to know pandas or PySpark?+
How many Python problems should I solve before an interview?+
How does the test runner handle hidden test cases?+
Can I run my own scratch code against the test inputs?+
Open the editor and write a function
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition