SQL and Python Practice Problems for Data Engineers
Data Engineering Interview Practice Problems
1475+ data engineering interview practice problems with real code execution. Write SQL queries, Python solutions, and design schemas against live databases with instant grading. Filter by domain, difficulty, seniority level, and target company.
Domains: SQL (905), Pipeline Architecture (123), Data Modeling (57), Python (390). Difficulty breakdown: medium (684), hard (257), easy (534).
SQL Practice Problems (905)
- 10 Lowest Uptime Services - medium - Ten services at the bottom of the reliability chart.
- 2FA Confirmation Rate - medium - Two-factor sent. How many confirmed?
- 2nd Most Common Content Type - hard - Everyone talks about number one. What about number three?
- 30-Day Page View Counts - easy - Thirty days of engagement. Quick snapshot.
- 7-Check Rolling Average - medium - Seven entries hold the trend.
- 7-Day Onboarding Conversion - hard - Signed up Monday. Still here by Sunday?
- 7-Day Token Retention - medium - Premium tokens, day by day.
- 80th Percentile API Latency - medium - The 80th percentile tells the real story.
- 90th Pctl Model Accuracy Gap - medium - Most models are fine. The bottom 10% are not.
- Above Average - easy - Products beating the catalog average.
Pipeline Architecture Practice Problems (123)
- 45 Minutes Turned Into 3.5 Hours - medium - Spark jobs are running. Just not fast enough.
- 600 Million Events a Day - hard - 600 million events a day. Two years of retention.
- A Clean Number for Every Merchant - hard - Raw payment logs in. Clean merchant summaries out.
- A Million Cars Phoning Home - hard - Every vehicle is a sensor. Deploy the pipeline to catch it all.
- Analysts Are Slowing the Store Down - medium - Orders placed. Data warehouse hungry.
- A New Column on a Billion Rows - hard - Add and backfill a new column to a billion-row production table with zero downtime.
- A Shared Drive Full of Contracts - medium - Buried in PDFs. The data is in there somewhere.
- A Stream All Day and a File at Midnight - hard - Real-time and batch. Same pipeline. No compromises.
- Badging Items That Already Sold Out - hard - Same-day delivery. The features have to be faster.
- Basel, CCAR, and Monday Morning - medium - The regulator does not accept 'eventually consistent.'
Data Modeling Practice Problems (57)
- A/B Experiment Assignment Schema - medium - One user, one experiment, one variant. No exceptions.
- Where They Used to Live - medium - They moved. The data stayed behind.
- Airline Flight Operations Schema - medium - Flights, passengers, and routes. Before you draw a single table, tell me the grain.
- A Number for the Seller - easy - They want a total. Give them the right schema first.
- B2B Invoicing Data Model - easy - Invoices go out, partial payments trickle in, and some customers are three months overdue.
- Clickstream and Session Schema - medium - Millions of clicks, mostly anonymous.
- Cloud File Storage Metadata Schema - hard - A file is also a folder. A folder is also a file.
- Content Engagement Data Model - hard - Post published. Now measure everything that happens next.
- Content Search and Discovery Schema - hard - Searchable from every angle. Design it so nothing gets lost.
- Customer Address History - easy - People move. Sometimes twice in a month. How do you remember where everyone was, and when?
Python Practice Problems (390)
- Activity Time Ledger - easy - Matching activities. One runtime.
- Batch Partitioner - medium - One pile becomes many. Split wisely.
- Batch Records - medium - Too many at once. Break them into groups.
- Batch With Metadata - easy - The list gets chopped.
- Caesar Shift Check - easy - The key turns. Does it open?
- Character Occurrence Map - easy - Character frequency as a map.
- Char Profile - medium - Every character in the string tells a story.
- Coalesce Fields - easy - Nulls are hiding. Fill them in.
- Column Max - easy - One value rules the column.
- Column Range - easy - From minimum to maximum. What is the spread?