How does the DataDriven mock interview work?

DataDriven mock interviews have four phases. Phase 1 (Think): pick a domain (SQL, Python, Data Modeling, or Pipeline Architecture), select your seniority level (Junior through Staff) and target company tier. You receive a deliberately vague prompt and ask clarifying questions to an AI interviewer who responds like a real hiring manager. Phase 2 (Code/Design): write and execute your solution. SQL runs against a real database. Python executes for real. Data Modeling and Pipeline Architecture use interactive canvases. Phase 3 (Discuss): the AI interviewer asks follow-up questions about your solution one at a time. You respond, and it asks another. This continues for up to 8 exchanges. The interviewer probes edge cases, optimization, and may introduce curveball requirements that change the problem mid-interview. Phase 4 (Verdict): receive a hire/no-hire decision with specific feedback on strengths, gaps, and what to study next.

What domains are covered in the mock interviews?

Four domains, each with both Problem mode and Interview mode. SQL: 850+ questions with real SQL execution covering joins, window functions, GROUP BY, CTEs, subqueries, COALESCE, CASE WHEN, pivot, rank, and partition by. Python: 388+ questions with real code execution covering data transformation, dictionary operations, file parsing, ETL logic, PySpark, and debugging. Data Modeling: interactive schema design canvas covering star schema, snowflake schema, dimensional modeling, slowly changing dimensions, data vault, and grain definition. Pipeline Architecture: interactive pipeline design canvas covering ETL vs ELT, batch vs streaming, Spark, Kafka, Airflow, dbt, storage architecture, fault tolerance, and incremental loading.

Yes. DataDriven is 100% free. No trial, no credit card, no catch. Every feature is unlocked for all users.

Data Engineering Mock Interview Practice

Interview Practice

Domain

Level

Company

Search companies...

Scope

Ask scoping questions before implementing your solution. Question your interviewer about data characteristics, constraints, and edge cases to clarify requirements.

Code

Build your solution in a live editor; perfect syntax isn’t necessary. Use all of your learnings from scoping; your discoveries carry directly into this phase.

Discuss

Answer the interviewer’s follow-up questions and reason through tradeoffs, scaling, and production concerns. Modify your solution to fit new constraints.

Feedback

Clear interviewer feedback signal on your strengths and weaknesses. Critical analysis of performance with targeted recommendations to improve.

About DataDriven Mock Interviews

DataDriven is a free web application that simulates all four rounds of a data engineering interview: SQL, Python, Data Modeling, and Pipeline Architecture. Each domain can be practiced in two modes: Problem mode (self-paced with instant grading) and Interview mode (timed AI mock interview simulation).

Interview mode has four phases. Phase 1 (Think): you receive a deliberately vague prompt and ask clarifying questions to an AI interviewer, who responds like a real hiring manager. Phase 2 (Code/Design): you write and execute real SQL, real Python, or build schemas/pipelines on an interactive canvas. Phase 3 (Discuss): the AI interviewer asks follow-up questions about your solution, one question at a time. You respond, and it asks another. This continues for up to 8 exchanges. The interviewer probes edge cases, optimization, alternative approaches, and may introduce curveball requirements that change the problem mid-interview. Phase 4 (Verdict): you receive a hire/no-hire decision with specific feedback on what you did well, where your reasoning had gaps, and what to study next.

Features: adaptive difficulty (problems scale to your performance), spaced repetition (weak concepts resurface at optimal intervals), readiness score (per-topic gap tracker), company-specific filtering (Google, Amazon, Meta, Stripe, Databricks, weighted by real interview data), and seniority calibration (Junior through Staff). 100% free, no trial, no credit card, no paywall.

Data Engineering Mock Interview Questions

1535+ data engineering mock interview questions with AI-powered feedback. Pick your domain, target company tier, and seniority level to start a timed interview simulation. Write real code, ask clarifying questions, and get graded instantly.

Available domains: Spark (14 questions), Python (401 questions), SQL (927 questions), Data Modeling (60 questions), Architecture (133 questions). Difficulty levels: easy (543), medium (728), hard (264). Seniority levels: Junior, Mid, Senior, Staff, Sr. Staff.

Spark Interview Questions (14)

The Word Count Shuffle Trap - easy - groupByKey works. Your cluster disagrees.
Too Many Small Files - easy - Two thousand files. One megabyte each. Athena says no.
Read the Plan - easy - 30 MB table. 80 GB shuffle. Read the plan.
Push It Down - medium - You renamed the column. Catalyst forgot how to prune.
The Cache That Ate the Cluster - medium - You cached 200 GB and forgot to let go.
Let AQE Handle It - medium - Five tasks take 35 minutes. The other 195 take 30 seconds.
Size the Executors - medium - Too big: GC kills you. Too small: broadcast kills you.
The Hours They Stayed - medium
Three Hours for Yesterday's Numbers - medium - 18 terabytes scanned. 50 megabytes needed.
Fix Skewed Viewing Events Pipeline - hard - Your nightly Spark job just paged you. One task has 40% of the data.

Python Interview Questions (401)

The Dominant Signal - easy - Hottest items in the transaction log. Ties included.
The Original Keeper - easy - Clean up duplicate events without losing the timeline.
The Forward Fill - easy - Patch the gaps in a noisy sensor stream.
The Word Mismatch - easy - Some text does not match.
The Social Graph - easy - Everyone knows someone.
The Sequel Spotter - easy - Spot the sequels hiding in the catalog.
The Numbered Chair - easy - A standing list. Position n holds one entry.
The Squeeze - easy - Long stretches of sameness collapse into almost nothing.
The One-Way Street - easy - Monotonic time-series. Direction only.
The IP Validator - easy - Real and fake, mixed together.

SQL Interview Questions (927)

Buyers Who Never Browsed - easy - They bought without ever loading a page.
Double Vision - easy - Before the records move, the ones wearing the same name twice have to surface.
Weekend Warriors - easy - Weekdays vs. weekends. When does the action really happen?
The Dormant Accounts - easy - They are still paying. They stopped showing up.
30-Day Page View Counts - easy - Thirty days of engagement. Quick snapshot.
Above Average Interactions - easy - The average user is boring. Who is above?
Above Category Average - easy - The category average is one thing. These beat it.
Active API Tokens - easy - Tokens that have actually been used.
Active Campaigns - easy - Which campaigns are earning their keep?
Active Token Owners in 2026 - easy - Active token owners this year.

Data Modeling Interview Questions (60)

Customer Address History - easy - People move. Sometimes twice in a month. How do you remember where everyone was, and when?
B2B Invoicing Data Model - easy - Invoices go out, partial payments trickle in, and some customers are three months overdue.
A Number for the Seller - easy - They want a total. Give them the right schema first.
Event Ticketing System Data Model - easy - JSON in. Reporting warehouse out. Design both ends.
The Balance Always Reconciles - easy - Money out, payments back. The balance has to be exact.
The No-Show - easy - Every reserved seat ends one of five ways. Build the model that can tell them apart.
Toll Road Sensor Analytics - easy - Cars enter, cars exit. Except when they don't.
Fitness App Data Model - easy - Reps, sets, streaks, and personal bests. Gym rats love their stats.
Ride-Sharing Platform Schema - medium - Riders, drivers, and fares. Everyone takes a cut.
Employee Transfer Tracking System - medium - People switch teams. HR loses track.

Architecture Interview Questions (133)

Sixty Minutes, Every Hour - medium - Every hour, on the hour. No excuses.
Six Million Rows Before the Market Opens - medium - One massive CSV. Millions of timestamps.
The Meal Kit That Knows You - medium - What they ordered says a lot about what they want next.
End of Day Is Too Late - medium - Every swipe tells a story.
Store, Site, and Distributor - medium - Sales data is piling up. Someone has to make sense of it.
Nested Docs, Flat Reports - medium - Two databases. One direction. No data left behind.
The Whiteboard Exercise - medium - Marker in hand. Draw the whole thing.
Where Is Every Truck, Right Now - medium - Trucks are moving. Every ping counts.
The Living Table - medium - Data lands continuously. History must survive every update.
SaaS API Connector with Incremental Sync - medium - The API has rate limits. You have deadlines.

How Interview Mode Works (Four Phases)

Phase 1 (Think): Choose a domain (SQL, Python, Data Modeling, or Pipeline Architecture), select your seniority level (Junior through Staff) and target company tier. You receive a deliberately vague prompt. Ask clarifying questions to the AI interviewer, who responds like a real hiring manager.
Phase 2 (Code/Design): Write and execute your solution. SQL runs against a real database. Python executes for real. Data Modeling uses an interactive schema canvas. Pipeline Architecture uses an interactive design canvas.
Phase 3 (Discuss): The AI interviewer asks follow-up questions about your solution, one question at a time. You respond, and it asks another. This continues for up to 8 exchanges. The interviewer probes edge cases, tests optimization awareness, challenges alternative approaches, and may introduce curveball requirements that change the problem mid-interview.
Phase 4 (Verdict): Receive a hire/no-hire decision with specific feedback on what you did well, where your reasoning had gaps, what the interviewer was testing, and what to study next.

How Problem Mode Works

Problem mode is self-paced practice with clear problem statements and instant grading. No AI interviewer, no timer, no discussion phase. Focus on building skill before testing it under interview pressure.