8-Week Data Engineer Mock Interview Practice Plan (2026)

78 problems across 5 domains, plus 4 full mock interview loops. This plan assumes 1.5 to 3 hours per day (depending on the week) and some prior SQL and Python experience. By week 8, you should be solving Medium problems in under 15 minutes and delivering structured system design walkthroughs in 30 minutes.

8
Weeks total
78
Practice problems
5
Domains covered
4
Full mock loops

Why You Need a Structured Practice Plan

Most candidates prepare for data engineering interviews the same way: they open a question list, solve random problems, and hope for the best. This approach has two problems. First, you spend too much time on domains you already know (usually SQL) and too little on domains you don't (usually data modeling and pipeline architecture). Second, you never build the stamina to perform under the time pressure of a real interview loop.

A structured plan solves both problems. It allocates your time proportionally to interview frequency: SQL gets 2 weeks because it appears in 41% of DE interview questions. Python gets 2 weeks because it appears in 35%. Data modeling gets 1 week (18%). Pipeline architecture gets 1 week (3%, but it appears in nearly all senior interviews). Spark gets 1 week for roles that require it. Week 8 is exclusively full mock interviews to build stamina and time management skills.

The plan is organized around weekly milestones, not just problem counts. Solving 15 SQL problems means nothing if you can't solve them under time pressure. Each week includes specific benchmarks that tell you whether you are ready to move on or need more practice in the current domain.

Prepare for the interview
01 / Open invite
02min.

Know the patterns before the interviewer asks them.

a SQL query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1SELECT user_id,
2 COUNT(*) AS sessions
3FROM events
4WHERE ts >= NOW() - INTERVAL '7 day'
5
Execute your solution0.4s avg.
MicrosoftInterview question
Solve a problem

Prerequisites: What You Should Know Before Starting

This plan is designed for candidates with at least 6 months of professional experience with SQL and Python. You should be able to write a basic SELECT with WHERE and GROUP BY without looking up the syntax. You should know what a dictionary is in Python and how to iterate over a list.

If you are starting from zero, add 4 weeks of fundamentals before this plan: 2 weeks of SQL basics (SELECT, WHERE, GROUP BY, JOINs) and 2 weeks of Python basics (data types, loops, functions, file I/O). DataDriven's Learn section covers these fundamentals with interactive lessons.

You do not need prior experience with Spark, data modeling, or pipeline architecture. Weeks 5 to 7 build these skills from scratch. But if you have experience in these areas, you can accelerate those weeks and spend more time on mock interviews.

The 8-Week Plan, Week by Week

  1. 01

    Week 1: SQL Fundamentals (1.5 hrs/day, 15 problems)

    Topics: INNER/LEFT/RIGHT/FULL OUTER JOIN, self-joins, GROUP BY with HAVING, subqueries in WHERE/FROM/SELECT, NULL handling with COALESCE/NULLIF/IS NULL. Daily breakdown: Day 1-2 JOINs (5 problems). Day 3-4 GROUP BY and aggregation (5 problems). Day 5-7 subqueries and combined patterns (5 problems). Run every query on DataDriven. End-of-week milestones: Solve any JOIN problem in under 10 minutes. Write GROUP BY with HAVING from memory without syntax errors. Handle NULLs in JOINs and aggregations correctly on first attempt.

  2. 02

    Week 2: SQL Advanced (1.5 hrs/day, 15 problems)

    Topics: ROW_NUMBER/RANK/DENSE_RANK, LAG/LEAD for sequential comparisons, running totals with SUM OVER, moving averages with AVG OVER, ROWS BETWEEN vs RANGE BETWEEN, recursive CTEs, multi-level CTEs. Daily breakdown: Day 1-2 ranking functions (4 problems). Day 3-4 LAG/LEAD and running calculations (5 problems). Day 5 recursive CTEs (3 problems). Day 6-7 mixed window function problems (3 problems). End-of-week milestones: Write ROW_NUMBER/RANK/DENSE_RANK with correct PARTITION BY and ORDER BY on first attempt. Use LAG/LEAD for period-over-period comparisons. Write a recursive CTE for hierarchical data traversal.

  3. 03

    Week 3: Python Data Manipulation (1.5 hrs/day, 10 problems)

    Topics: JSON parsing and nested traversal, CSV processing with generators, file I/O and memory-efficient reading, collections (Counter, defaultdict, OrderedDict, deque), dictionary comprehensions, error handling with specific exceptions. Daily breakdown: Day 1-2 JSON parsing and flattening (3 problems). Day 3-4 file processing with generators (3 problems). Day 5-7 collections and mixed patterns (4 problems). End-of-week milestones: Flatten nested JSON of any depth without looking up the approach. Process a large file using generators without loading it into memory. Use collections.Counter, defaultdict, and deque fluently.

  4. 04

    Week 4: Python Pipeline Patterns (1.5 hrs/day, 10 problems)

    Topics: Sessionization with inactivity gaps, retry logic with exponential backoff and jitter, schema validation and dead letter queues, hash-based change detection, decorators for logging and timing, type hints and dataclasses. Daily breakdown: Day 1-2 sessionization and time-window logic (3 problems). Day 3-4 retry and error handling patterns (3 problems). Day 5-7 validation, change detection, and production patterns (4 problems). End-of-week milestones: Implement sessionization from scratch in under 15 minutes. Write retry with exponential backoff and jitter from memory. Build a schema validation function that routes bad records to a DLQ.

  5. 05

    Week 5: Data Modeling (2 hrs/day, 10 problems)

    Topics: Star schema design (grain, facts, dimensions), dimension types (conformed, degenerate, junk, role-playing), SCD Types 1/2/3, data vault (hubs, links, satellites), medallion architecture, OBT vs star schema trade-offs, bridge tables for many-to-many. Daily breakdown: Day 1-2 star schema design for e-commerce and ride-sharing (3 exercises). Day 3-4 SCD handling and history tracking (3 exercises). Day 5-6 data vault and medallion architecture (2 exercises). Day 7 mixed modeling with trade-off discussions (2 exercises). End-of-week milestones: Design a star schema for any domain in under 20 minutes with correct grain, facts, and dimensions. Explain SCD Types 1, 2, and 3 with concrete examples and trade-offs. Articulate when to use star schema vs data vault vs OBT.

  6. 06

    Week 6: Pipeline Architecture (2 hrs/day, 8 designs)

    Topics: Ingestion patterns (batch, micro-batch, streaming), message queues (Kafka, SQS, Pub/Sub), orchestration (Airflow, Dagster, Prefect), idempotency strategies (DELETE-INSERT, MERGE/UPSERT, atomic swap), schema evolution, monitoring (freshness, volume anomalies, schema drift), failure handling (retries, DLQ, circuit breakers), pipeline optimization. Daily breakdown: Day 1-2 clickstream pipeline and multi-source API ingestion (2 designs). Day 3-4 migration planning and optimization (2 designs). Day 5-6 monitoring and failure handling (2 designs). Day 7 mixed architecture designs (2 designs). End-of-week milestones: Design an end-to-end pipeline in a structured 15-minute walkthrough. Explain batch vs streaming trade-offs with specific latency, cost, and complexity numbers. Describe idempotency strategies and when to use each.

  7. 07

    Week 7: Spark (1.5 hrs/day, 10 problems)

    Topics: DataFrame API (select, filter, groupBy, agg, join), window functions in PySpark, join strategies (broadcast, sort-merge, shuffle hash), repartition vs coalesce, data skew detection and salting, caching and persistence, Delta Lake (MERGE, OPTIMIZE, time travel), Structured Streaming (watermarks, output modes, triggers). Daily breakdown: Day 1-2 DataFrame transformations and window functions (3 problems). Day 3-4 join optimization and skew handling (3 problems). Day 5-6 Delta Lake operations (2 problems). Day 7 Structured Streaming (2 problems). End-of-week milestones: Write a PySpark transformation with correct partitioning and no unnecessary shuffles. Debug a Spark OOM error by identifying skew vs insufficient memory. Explain the Catalyst optimizer stages and what each one does.

  8. 08

    Week 8: Full Mock Interviews (3 hrs/day including breaks)

    Schedule: Monday and Thursday are full mock interview loops (4 rounds each). Tuesday and Friday are timed individual rounds on weak areas. Wednesday and Saturday are for reviewing mistakes from mock interviews. Sunday: rest. Mock loop structure: Round 1 SQL (45 min), 15-min break, Round 2 Python (45 min), 15-min break, Round 3 System Design (45 min), 15-min break, Round 4 Behavioral (30 min). Use DataDriven's mock interview simulator to automate question selection, timing, and AI feedback. End-of-week milestones: Complete a full mock interview loop in under 3 hours. Solve Medium SQL and Python problems in under 15 minutes each while explaining your approach. Deliver a structured system design walkthrough in 30 minutes.

When and How to Adjust the Plan

No plan survives contact with reality. Here are the three most common adjustments and when to make them.

You are missing milestones. If you can't hit the weekly milestones by the end of the week, spend 2 to 3 extra days on that domain before moving on. It's better to finish the plan in 9 to 10 weeks with solid skills than to rush through in 8 weeks with gaps. Data engineering interviews test depth, not breadth. A shallow understanding of 5 domains is less valuable than deep mastery of 3.

You are ahead of schedule. If you hit all milestones by day 5 of a week, use the remaining 2 days to start Hard problems in that domain. Hard problems appear in senior and staff-level interviews. If you are targeting L5+ roles, you need to be comfortable with Hard difficulty.

Your target company emphasizes a specific domain. If the job description mentions Spark heavily, consider swapping weeks 6 and 7 so you spend more time on Spark. If the company is known for behavioral interviews (Amazon, for example), add a behavioral prep component to week 8. If the role is data modeling heavy (analytics engineer positions), double the time on week 5 and reduce Python.

Recommended Daily Routine

First 15 minutes

Review yesterday's mistakes. Read the AI feedback on your previous submissions. Identify one pattern you got wrong and resolve to watch for it today.

Next 45 minutes

Solve 2 to 3 new problems from the current week's domain. Set a timer for each problem. If you're stuck after 10 minutes, read the hint. If you're stuck after 20 minutes, study the solution and solve it again from scratch tomorrow.

Next 15 minutes

Review the AI evaluation feedback on today's submissions. Write down one thing you learned in a notebook (physical or digital). This five-sentence summary cements the pattern in long-term memory.

Final 15 minutes

Speed drill: solve one Easy problem from a previous week's domain as fast as possible. This maintains skills you've already built while you learn new ones. Track your solve time and try to beat yesterday's record.

Progress Checkpoints: Are You on Track?

After Week 2: SQL proficiency

You can solve any Medium SQL problem (JOINs, window functions, CTEs) in under 15 minutes with correct handling of NULLs and edge cases. If not, spend week 3 on SQL instead of Python.

After Week 4: Python proficiency

You can write a Python function that processes a file lazily, handles errors gracefully, and includes type hints. You can explain your code while writing it. If not, extend Python by 3 to 4 days.

After Week 6: Design proficiency

You can design a star schema for an unfamiliar domain in 20 minutes and draw a pipeline architecture diagram in 15 minutes, explaining trade-offs at each layer. If not, extend by 3 to 4 days.

After Week 8: Interview ready

You complete a full 4-round mock interview and receive 'Strong Hire' on 3+ rounds. Your SQL and Python solve times are under 15 minutes for Medium problems. You explain your approach clearly without long pauses.

5 Mistakes That Derail Interview Prep Plans

1. Spending all your time on SQL. SQL is the most tested domain (41%), but candidates who only practice SQL fail Python and modeling rounds. Stick to the plan's time allocation.

2. Skipping mock interviews. Individual problem practice builds skills. Mock interviews build performance ability. Many candidates skip week 8 because they feel 'not ready.' You will never feel ready. Do the mock interviews anyway. That's how you become ready.

3. Reading solutions without writing code. If you read a solution and think 'I would have gotten that,' you are fooling yourself. The only way to verify that you can solve a problem is to solve it. On a blank screen. With a timer. Every time.

4. Not practicing communication. Data engineering interviews are not just coding tests. Interviewers evaluate how you explain your approach, ask clarifying questions, and discuss trade-offs. Practice thinking out loud while you code. It feels awkward at first. It becomes natural by week 4.

5. Ignoring your weakest domain. Your brain wants to practice what you are already good at. It feels productive. It isn't. Your weakest domain is your highest-impact area for improvement. If you dread data modeling, that's exactly where you need to spend more time.

Practice Plan FAQ

Can I compress this 8-week plan into 4 weeks?+
Yes, if you can commit 3+ hours per day. Double the daily problem count and combine weeks 1 to 2 into one week (SQL), weeks 3 to 4 into one week (Python), week 5 stays (modeling), and combine weeks 6 to 7 (architecture + Spark). Keep week 8 (mock interviews) as a full week. Compressed timelines work, but the risk is shallow understanding. If you find yourself memorizing solutions instead of understanding patterns, slow down.
What if I don't need Spark for my target role?+
Skip week 7 and add that time to your weakest domain. If your target company's job description does not mention Spark, Databricks, or distributed processing, spend week 7 on extra SQL and data modeling practice instead. Check the job description carefully: some companies test Spark even when it's not listed because they want to see if you can think at scale.
How do I know if I'm ready for real interviews?+
Three benchmarks: (1) You solve Medium SQL problems in under 15 minutes on the first attempt with correct edge case handling. (2) You can design a star schema for an unfamiliar domain in under 20 minutes while explaining your choices out loud. (3) You complete a full mock interview loop on DataDriven and the AI evaluates you as 'Strong Hire' on at least 3 of 4 rounds. If you meet all three, start scheduling real interviews.
Should I study every day or take weekends off?+
Study 6 days per week and rest on Sunday. Consistency matters more than intensity. Three months of 1 hour per day outperforms two weeks of 8 hours per day. Your brain needs sleep to consolidate pattern recognition. If you skip rest days, you'll burn out around week 5 and stop retaining new material.
What if I'm already strong in SQL but weak in data modeling?+
Spend 1 week on SQL (review only, focus on Hard problems) instead of 2 weeks. Add the extra week to data modeling. Customize the plan based on your starting point. The domain time allocation should reflect your personal gaps, not just interview frequency. Take a diagnostic test on DataDriven to identify your weakest domain before starting the plan.
02 / Why practice

8 Weeks. 78 Problems. 4 Mock Loops. One Plan.

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Related Guides