Data Engineering Mock Interview Practice
Scope
Ask scoping questions before implementing your solution. Question your interviewer about data characteristics, constraints, and edge cases to clarify requirements.
Code
Build your solution in a live editor; perfect syntax isn’t necessary. Use all of your learnings from scoping; your discoveries carry directly into this phase.
Discuss
Answer the interviewer’s follow-up questions and reason through tradeoffs, scaling, and production concerns. Modify your solution to fit new constraints.
Feedback
Clear interviewer feedback signal on your strengths and weaknesses. Critical analysis of performance with targeted recommendations to improve.
About DataDriven Mock Interviews
DataDriven is a free web application that simulates all four rounds of a data engineering interview: SQL, Python, Data Modeling, and Pipeline Architecture. Each domain can be practiced in two modes: Problem mode (self-paced with instant grading) and Interview mode (timed AI mock interview simulation).
Interview mode has four phases. Phase 1 (Think): you receive a deliberately vague prompt and ask clarifying questions to an AI interviewer, who responds like a real hiring manager. Phase 2 (Code/Design): you write and execute real SQL, real Python, or build schemas/pipelines on an interactive canvas. Phase 3 (Discuss): the AI interviewer asks follow-up questions about your solution, one question at a time. You respond, and it asks another. This continues for up to 8 exchanges. The interviewer probes edge cases, optimization, alternative approaches, and may introduce curveball requirements that change the problem mid-interview. Phase 4 (Verdict): you receive a hire/no-hire decision with specific feedback on what you did well, where your reasoning had gaps, and what to study next.
Features: adaptive difficulty (problems scale to your performance), spaced repetition (weak concepts resurface at optimal intervals), readiness score (per-topic gap tracker), company-specific filtering (Google, Amazon, Meta, Stripe, Databricks, weighted by real interview data), and seniority calibration (Junior through Staff). 100% free, no trial, no credit card, no paywall.
Data Engineering Mock Interview Questions
1487+ data engineering mock interview questions with AI-powered feedback. Pick your domain, target company tier, and seniority level to start a timed interview simulation. Write real code, ask clarifying questions, and get graded instantly.
Available domains: Spark (12 questions), Python (390 questions), SQL (905 questions), Data Modeling (57 questions), Architecture (123 questions). Difficulty levels: easy (537), medium (689), hard (261). Seniority levels: Junior, Mid, Senior, Staff, Sr. Staff.
Spark Interview Questions (12)
- The Word Count Shuffle Trap - easy - groupByKey works. Your cluster disagrees.
- Too Many Small Files - easy - Two thousand files. One megabyte each. Athena says no.
- Read the Plan - easy - 30 MB table. 80 GB shuffle. Read the plan.
- Push It Down - medium - You renamed the column. Catalyst forgot how to prune.
- The Cache That Ate the Cluster - medium - You cached 200 GB and forgot to let go.
- Let AQE Handle It - medium - Five tasks take 35 minutes. The other 195 take 30 seconds.
- Size the Executors - medium - Too big: GC kills you. Too small: broadcast kills you.
- Three Hours for Yesterday's Numbers - medium - 18 terabytes scanned. 50 megabytes needed.
- Fix Skewed Viewing Events Pipeline - hard - Your nightly Spark job just paged you. One task has 40% of the data.
- Salt the Hot Merchant - hard - One merchant owns 38% of your rows. Salt or suffer.
Python Interview Questions (390)
- The Dominant Signal - easy - Hottest items in the transaction log. Ties included.
- The Original Keeper - easy - Clean up duplicate events without losing the timeline.
- The Forward Fill - easy - Patch the gaps in a noisy sensor stream.
- The Word Mismatch - easy - Some text does not match.
- The Social Graph - easy - Everyone knows someone.
- The Sequel Spotter - easy - Spot the sequels hiding in the catalog.
- The Numbered Chair - easy - A standing list. Position n holds one entry.
- The Character Encoder - easy - Squeeze a string down to its tightest form.
- The One-Way Street - easy - Monotonic time-series. Direction only.
- The IP Validator - easy - Real and fake, mixed together.
SQL Interview Questions (905)
- Buyers Who Never Browsed - easy - They bought without ever loading a page.
- The Duplicate Detection Sprint - easy - Same email, different rows. Spot the repeats.
- Weekend Warriors - easy - Weekdays vs. weekends. When does the action really happen?
- The Dormant Accounts - easy - They are still paying. They stopped showing up.
- 30-Day Page View Counts - easy - Thirty days of engagement. Quick snapshot.
- Above Average Interactions - easy - The average user is boring. Who is above?
- Above Category Average - easy - The category average is one thing. These beat it.
- Active API Tokens - easy - Tokens that have actually been used.
- Active Campaigns - easy - Which campaigns are earning their keep?
- Active Token Owners in 2026 - easy - Active token owners this year.
Data Modeling Interview Questions (57)
- Customer Address History - easy - People move. Sometimes twice in a month. How do you remember where everyone was, and when?
- B2B Invoicing Data Model - easy - Invoices go out, partial payments trickle in, and some customers are three months overdue.
- Fitness Studio Membership Schema - easy - Classes fill up. Members no-show. Billing continues.
- A Number for the Seller - easy - They want a total. Give them the right schema first.
- Event Ticketing System Data Model - easy - JSON in. Reporting warehouse out. Design both ends.
- Loan Management Schema - easy - Money out, payments back. The balance has to be exact.
- Toll Road Sensor Analytics - easy - Cars enter, cars exit. Except when they don't.
- Fitness App Data Model - easy - Reps, sets, streaks, and personal bests. Gym rats love their stats.
- Ride-Sharing Platform Schema - medium - Riders, drivers, and fares. Everyone takes a cut.
- Employee Transfer Tracking System - medium - People switch teams. HR loses track.
Architecture Interview Questions (123)
- Sixty Minutes, Every Hour - medium - Every hour, on the hour. No excuses.
- Six Million Rows Before the Market Opens - medium - One massive CSV. Millions of timestamps.
- The Meal Kit That Knows You - medium - What they ordered says a lot about what they want next.
- End of Day Is Too Late - medium - Every swipe tells a story.
- Store, Site, and Distributor - medium - Sales data is piling up. Someone has to make sense of it.
- Nested Docs, Flat Reports - medium - Two databases. One direction. No data left behind.
- The Whiteboard Exercise - medium - Marker in hand. Draw the whole thing.
- Where Is Every Truck, Right Now - medium - Trucks are moving. Every ping counts.
- The Living Table - medium - Data lands continuously. History must survive every update.
- SaaS API Connector with Incremental Sync - medium - The API has rate limits. You have deadlines.
How Interview Mode Works (Four Phases)
- Phase 1 (Think): Choose a domain (SQL, Python, Data Modeling, or Pipeline Architecture), select your seniority level (Junior through Staff) and target company tier. You receive a deliberately vague prompt. Ask clarifying questions to the AI interviewer, who responds like a real hiring manager.
- Phase 2 (Code/Design): Write and execute your solution. SQL runs against a real database. Python executes for real. Data Modeling uses an interactive schema canvas. Pipeline Architecture uses an interactive design canvas.
- Phase 3 (Discuss): The AI interviewer asks follow-up questions about your solution, one question at a time. You respond, and it asks another. This continues for up to 8 exchanges. The interviewer probes edge cases, tests optimization awareness, challenges alternative approaches, and may introduce curveball requirements that change the problem mid-interview.
- Phase 4 (Verdict): Receive a hire/no-hire decision with specific feedback on what you did well, where your reasoning had gaps, what the interviewer was testing, and what to study next.
How Problem Mode Works
Problem mode is self-paced practice with clear problem statements and instant grading. No AI interviewer, no timer, no discussion phase. Focus on building skill before testing it under interview pressure.
Practice by Domain and Mode
- SQL Mock Interview Practice
- Python Mock Interview Practice
- Data Modeling Mock Interview Practice
- Pipeline Architecture Mock Interview Practice
- SQL Practice Problems (self-paced)
- Python Practice Problems (self-paced)
- Data Modeling Practice Problems (self-paced)
- Pipeline Architecture Practice Problems (self-paced)
Interview Guides
- SQL Interview Questions Guide
- Python Interview Questions Guide
- Data Modeling Interview Questions
- Pipeline Architecture Interview Questions
- Data Engineering Interview Prep
- System Design Interview Questions