Structured Prep Plan

8-Week Data Engineer Mock Interview Practice Plan

78 problems across 5 domains, plus 4 full mock interview loops. This plan assumes 1.5 to 3 hours per day (depending on the week) and some prior SQL and Python experience. By week 8, you should be solving Medium problems in under 15 minutes and delivering structured system design walkthroughs in 30 minutes.

8

Weeks total

78

Practice problems

5

Domains covered

4

Full mock loops

Why You Need a Structured Practice Plan

Most candidates prepare for data engineering interviews the same way: they open a question list, solve random problems, and hope for the best. This approach has two problems. First, you spend too much time on domains you already know (usually SQL) and too little on domains you don't (usually data modeling and pipeline architecture). Second, you never build the stamina to perform under the time pressure of a real interview loop.

A structured plan solves both problems. It allocates your time proportionally to interview frequency: SQL gets 2 weeks because it appears in 41% of DE interview questions. Python gets 2 weeks because it appears in 35%. Data modeling gets 1 week (18%). Pipeline architecture gets 1 week (3%, but it appears in nearly all senior interviews). Spark gets 1 week for roles that require it. Week 8 is exclusively full mock interviews to build stamina and time management skills.

The plan is organized around weekly milestones, not just problem counts. Solving 15 SQL problems means nothing if you can't solve them under time pressure. Each week includes specific benchmarks that tell you whether you are ready to move on or need more practice in the current domain.

Prerequisites: What You Should Know Before Starting

This plan is designed for candidates with at least 6 months of professional experience with SQL and Python. You should be able to write a basic SELECT with WHERE and GROUP BY without looking up the syntax. You should know what a dictionary is in Python and how to iterate over a list.

If you are starting from zero, add 4 weeks of fundamentals before this plan: 2 weeks of SQL basics (SELECT, WHERE, GROUP BY, JOINs) and 2 weeks of Python basics (data types, loops, functions, file I/O). DataDriven's Learn section covers these fundamentals with interactive lessons.

You do not need prior experience with Spark, data modeling, or pipeline architecture. Weeks 5 to 7 build these skills from scratch. But if you have experience in these areas, you can accelerate those weeks and spend more time on mock interviews.

Week 11.5 hours/day15 problems

SQL Fundamentals: JOINs, GROUP BY, Subqueries

Topics

INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOINSelf-joins (find pairs, hierarchical data)GROUP BY with HAVING, COUNT, SUM, AVGSubqueries in WHERE, FROM, and SELECT clausesNULL handling: COALESCE, NULLIF, IS NULL in JOINs

Daily Breakdown

Day 1 to 2: JOINs (5 problems). Day 3 to 4: GROUP BY and aggregation (5 problems). Day 5 to 7: Subqueries and combined patterns (5 problems). Run every query on DataDriven. Don't skip to the solution. If you're stuck for more than 10 minutes, look at the hint, not the answer.

End-of-Week Milestones

Solve any INNER/LEFT/RIGHT JOIN problem in under 10 minutes
Write GROUP BY with HAVING from memory without syntax errors
Handle NULLs in JOINs and aggregations correctly on first attempt
Week 21.5 hours/day15 problems

SQL Advanced: Window Functions, CTEs, Recursive Queries

Topics

ROW_NUMBER, RANK, DENSE_RANK (and when to use which)LAG, LEAD for sequential comparisonsRunning totals with SUM OVER, moving averages with AVG OVERROWS BETWEEN vs RANGE BETWEENRecursive CTEs for org charts, BOM explosions, graph traversalMulti-level CTEs with intermediate steps

Daily Breakdown

Day 1 to 2: ranking functions (4 problems). Day 3 to 4: LAG/LEAD and running calculations (5 problems). Day 5: recursive CTEs (3 problems). Day 6 to 7: mixed window function problems at Medium difficulty (3 problems). By end of week 2, you should solve Medium SQL problems in under 15 minutes consistently.

End-of-Week Milestones

Write ROW_NUMBER, RANK, DENSE_RANK with correct PARTITION BY and ORDER BY on first attempt
Use LAG/LEAD for gap detection and period-over-period comparisons
Write a recursive CTE for hierarchical data traversal
Week 31.5 hours/day10 problems

Python Data Manipulation: Files, JSON, Collections

Topics

JSON parsing: json.loads, nested traversal, flatteningCSV processing: csv.reader, DictReader, chunked readingFile I/O: generators for memory-efficient processingcollections: Counter, defaultdict, OrderedDict, dequeDictionary comprehensions and set operationsError handling: specific exceptions, logging, retry patterns

Daily Breakdown

Day 1 to 2: JSON parsing and flattening (3 problems). Day 3 to 4: file processing with generators (3 problems). Day 5 to 7: collections and mixed patterns (4 problems). Write all code in the DataDriven editor. The AI grader catches style issues that you won't notice yourself.

End-of-Week Milestones

Flatten nested JSON of any depth without looking up the approach
Process a large file using generators without loading it into memory
Use collections.Counter, defaultdict, and deque fluently
Week 41.5 hours/day10 problems

Python Pipeline Patterns: ETL Logic, Testing, Production Code

Topics

Sessionization with inactivity gapsRetry logic with exponential backoff and jitterSchema validation and dead letter queuesHash-based change detection (checksums for DataFrames)Decorators for logging, timing, and retryType hints and dataclasses for clean pipeline code

Daily Breakdown

Day 1 to 2: sessionization and time-window logic (3 problems). Day 3 to 4: retry and error handling patterns (3 problems). Day 5 to 7: validation, change detection, and production patterns (4 problems). These problems map directly to what you do on the job. Interviewers love candidates who write production-quality code in interview settings.

End-of-Week Milestones

Implement sessionization from scratch in under 15 minutes
Write retry with exponential backoff and jitter from memory
Build a schema validation function that routes bad records to a DLQ
Week 52 hours/day10 problems

Data Modeling: Star Schemas, SCDs, Design Trade-offs

Topics

Star schema design: identifying grain, facts, dimensionsDimension types: conformed, degenerate, junk, role-playingSCD Type 1 (overwrite), Type 2 (history), Type 3 (columns)Data vault: hubs, links, satellitesMedallion architecture: bronze, silver, goldOne Big Table (OBT) vs star schema trade-offsBridge tables for many-to-many relationships

Daily Breakdown

Day 1 to 2: star schema design for e-commerce and ride-sharing (3 exercises). Day 3 to 4: SCD handling and history tracking (3 exercises). Day 5 to 6: data vault and medallion architecture (2 exercises). Day 7: mixed modeling exercises with trade-off discussions (2 exercises). Practice explaining your design choices out loud. Modeling interviews are 60% communication.

End-of-Week Milestones

Design a star schema for any domain in under 20 minutes with correct grain, facts, and dimensions
Explain SCD Types 1, 2, and 3 with concrete examples and trade-offs
Articulate when to use star schema vs data vault vs OBT
Week 62 hours/day8 problems

Pipeline Architecture: System Design for Data Engineers

Topics

Ingestion patterns: batch, micro-batch, streamingMessage queues: Kafka, SQS, Pub/Sub (when to use which)Orchestration: Airflow, Dagster, Prefect trade-offsIdempotency: DELETE-INSERT, MERGE/UPSERT, atomic swapSchema evolution and schema registriesMonitoring: data freshness, volume anomalies, schema driftFailure handling: retries, dead letter queues, circuit breakersPipeline optimization: parallelization, incremental loads, partitioning

Daily Breakdown

Day 1 to 2: design a clickstream pipeline (1 design) and a multi-source API ingestion pipeline (1 design). Day 3 to 4: migration planning and optimization (2 designs). Day 5 to 6: monitoring and failure handling (2 designs). Day 7: mixed architecture designs (2 designs). Use a whiteboard or drawing tool. Practice structuring your answer: requirements, high-level design, deep dive, trade-offs.

End-of-Week Milestones

Design an end-to-end pipeline from source to warehouse in a structured 15-minute walkthrough
Explain batch vs streaming trade-offs with specific latency, cost, and complexity numbers
Describe idempotency strategies (DELETE-INSERT, MERGE, atomic swap) and when to use each
Week 71.5 hours/day10 problems

Spark: Distributed Processing, Performance, Streaming

Topics

DataFrame API: select, filter, groupBy, agg, joinWindow functions in PySparkJoin strategies: broadcast, sort-merge, shuffle hashRepartition vs coalesce (when and why)Data skew: detection, salting, AQECaching and persistence strategiesDelta Lake: MERGE, OPTIMIZE, time travelStructured Streaming: watermarks, output modes, triggers

Daily Breakdown

Day 1 to 2: DataFrame transformations and window functions (3 problems). Day 3 to 4: join optimization and skew handling (3 problems). Day 5 to 6: Delta Lake operations (2 problems). Day 7: Structured Streaming (2 problems). Run all code on DataDriven's PySpark environment. Reading about Spark is not the same as writing Spark code.

End-of-Week Milestones

Write a PySpark transformation with correct partitioning and no unnecessary shuffles
Debug a Spark OOM error by identifying skew vs insufficient memory
Explain the Catalyst optimizer stages and what each one does
Week 83 hours/day (including breaks)

Full Mock Interviews: Combine All Domains Under Pressure

Topics

Monday and Thursday: Full mock interview loops (4 rounds each)Tuesday and Friday: Timed individual rounds on weak areasWednesday and Saturday: Review mistakes from mock interviewsSunday: Rest. Your brain consolidates learning during rest.

Daily Breakdown

Mock interview structure: Round 1 (SQL, 45 min), 15-min break, Round 2 (Python, 45 min), 15-min break, Round 3 (System Design, 45 min), 15-min break, Round 4 (Behavioral, 30 min). Use DataDriven's mock interview simulator to automate question selection, timing, and AI feedback. Run 2 full loops per week. Review your weakest round each day between loops.

End-of-Week Milestones

Complete a full mock interview loop (SQL + Python + System Design + Behavioral) in under 3 hours
Solve Medium SQL and Python problems in under 15 minutes each while explaining your approach
Deliver a structured system design walkthrough in 30 minutes

When and How to Adjust the Plan

No plan survives contact with reality. Here are the three most common adjustments and when to make them.

You are missing milestones. If you can't hit the weekly milestones by the end of the week, spend 2 to 3 extra days on that domain before moving on. It's better to finish the plan in 9 to 10 weeks with solid skills than to rush through in 8 weeks with gaps. Data engineering interviews test depth, not breadth. A shallow understanding of 5 domains is less valuable than deep mastery of 3.

You are ahead of schedule. If you hit all milestones by day 5 of a week, use the remaining 2 days to start Hard problems in that domain. Hard problems appear in senior and staff-level interviews. If you are targeting L5+ roles, you need to be comfortable with Hard difficulty.

Your target company emphasizes a specific domain. If the job description mentions Spark heavily, consider swapping weeks 6 and 7 so you spend more time on Spark. If the company is known for behavioral interviews (Amazon, for example), add a behavioral prep component to week 8. If the role is data modeling heavy (analytics engineer positions), double the time on week 5 and reduce Python.

Recommended Daily Routine

Consistency beats intensity. Here is a daily routine that fits into a working schedule.

First 15 minutes

Review yesterday's mistakes. Read the AI feedback on your previous submissions. Identify one pattern you got wrong and resolve to watch for it today.

Next 45 minutes

Solve 2 to 3 new problems from the current week's domain. Set a timer for each problem. If you're stuck after 10 minutes, read the hint. If you're stuck after 20 minutes, study the solution and solve it again from scratch tomorrow.

Next 15 minutes

Review the AI grader feedback on today's submissions. Write down one thing you learned in a notebook (physical or digital). This five-sentence summary cements the pattern in long-term memory.

Final 15 minutes

Speed drill: solve one Easy problem from a previous week's domain as fast as possible. This maintains skills you've already built while you learn new ones. Track your solve time and try to beat yesterday's record.

Progress Checkpoints: Are You on Track?

After Week 2SQL proficiency

You can solve any Medium SQL problem (JOINs, window functions, CTEs) in under 15 minutes with correct handling of NULLs and edge cases. If not, spend week 3 on SQL instead of Python.

After Week 4Python proficiency

You can write a Python function that processes a file lazily, handles errors gracefully, and includes type hints. You can explain your code while writing it. If not, extend Python by 3 to 4 days.

After Week 6Design proficiency

You can design a star schema for an unfamiliar domain in 20 minutes and draw a pipeline architecture diagram in 15 minutes, explaining trade-offs at each layer. If not, extend by 3 to 4 days.

After Week 8Interview ready

You complete a full 4-round mock interview and receive 'Strong Hire' on 3+ rounds. Your SQL and Python solve times are under 15 minutes for Medium problems. You explain your approach clearly without long pauses.

5 Mistakes That Derail Interview Prep Plans

1. Spending all your time on SQL. SQL is the most tested domain (41%), but candidates who only practice SQL fail Python and modeling rounds. Stick to the plan's time allocation.

2. Skipping mock interviews. Individual problem practice builds skills. Mock interviews build performance ability. Many candidates skip week 8 because they feel "not ready." You will never feel ready. Do the mock interviews anyway. That's how you become ready.

3. Reading solutions without writing code. If you read a solution and think "I would have gotten that," you are fooling yourself. The only way to verify that you can solve a problem is to solve it. On a blank screen. With a timer. Every time.

4. Not practicing communication. Data engineering interviews are not just coding tests. Interviewers evaluate how you explain your approach, ask clarifying questions, and discuss trade-offs. Practice thinking out loud while you code. It feels awkward at first. It becomes natural by week 4.

5. Ignoring your weakest domain. Your brain wants to practice what you are already good at. It feels productive. It isn't. Your weakest domain is your highest-impact area for improvement. If you dread data modeling, that's exactly where you need to spend more time.

Practice Plan FAQ

Can I compress this 8-week plan into 4 weeks?+
Yes, if you can commit 3+ hours per day. Double the daily problem count and combine weeks 1 to 2 into one week (SQL), weeks 3 to 4 into one week (Python), week 5 stays (modeling), and combine weeks 6 to 7 (architecture + Spark). Keep week 8 (mock interviews) as a full week. Compressed timelines work, but the risk is shallow understanding. If you find yourself memorizing solutions instead of understanding patterns, slow down.
What if I don't need Spark for my target role?+
Skip week 7 and add that time to your weakest domain. If your target company's job description does not mention Spark, Databricks, or distributed processing, spend week 7 on extra SQL and data modeling practice instead. Check the job description carefully: some companies test Spark even when it's not listed because they want to see if you can think at scale.
How do I know if I'm ready for real interviews?+
Three benchmarks: (1) You solve Medium SQL problems in under 15 minutes on the first attempt with correct edge case handling. (2) You can design a star schema for an unfamiliar domain in under 20 minutes while explaining your choices out loud. (3) You complete a full mock interview loop on DataDriven and the AI grades you as 'Strong Hire' on at least 3 of 4 rounds. If you meet all three, start scheduling real interviews.
Should I study every day or take weekends off?+
Study 6 days per week and rest on Sunday. Consistency matters more than intensity. Three months of 1 hour per day outperforms two weeks of 8 hours per day. Your brain needs sleep to consolidate pattern recognition. If you skip rest days, you'll burn out around week 5 and stop retaining new material.
What if I'm already strong in SQL but weak in data modeling?+
Spend 1 week on SQL (review only, focus on Hard problems) instead of 2 weeks. Add the extra week to data modeling. Customize the plan based on your starting point. The domain time allocation should reflect your personal gaps, not just interview frequency. Take a diagnostic test on DataDriven to identify your weakest domain before starting the plan.

8 Weeks. 78 Problems. 4 Mock Loops. One Plan.

Every problem runs on real infrastructure with AI grading. Start week 1 today.