Data Engineering Study Plan (2, 8, or 16 Weeks)

Study time should be allocated in proportion to where interviews concentrate. From the verified rounds in the dataset, 32.7% are phone-screen SQL, 20.7% are technical screens, and 11.7% are onsite SQL, which puts SQL alone above 44% of all rounds. The plans below start with SQL, then layer Python, data modeling, and system design in that order.

44%
SQL rounds
26%
Technical + Python
~50%
Study time for SQL

2-Week Sprint

For an interview scheduled within two weeks. Front-loads SQL, then covers data modeling, pipeline design, and behavioral. Targets 3 to 4 hours of daily practice.

  1. 01

    Week 1: SQL Intensive

    Monday: Window functions. Solve 5 ROW_NUMBER and RANK problems. Time yourself at 15 minutes each. Review frame clauses (ROWS BETWEEN). | Tuesday: Window functions continued. 5 problems using LAG, LEAD, and running totals. Practice writing PARTITION BY clauses from scratch. | Wednesday: JOINs. 5 problems mixing INNER and LEFT JOIN. Focus on NULL behavior in LEFT JOINs. Then 3 self-join problems. | Thursday: Aggregation. GROUP BY with HAVING, conditional aggregation using CASE WHEN inside SUM/COUNT. 5 problems. | Friday: CTEs and subqueries. Chain 3+ CTEs in a single query. Practice correlated subqueries and EXISTS. 5 problems. | Saturday: Mixed SQL. 8 problems across all topics, timed. Simulate interview conditions: no looking up syntax. | Sunday: Rest day. Review your wrong answers from the week. Rewrite the hardest query from memory.

  2. 02

    Week 2: Modeling, Design, Behavioral

    Monday: Data modeling. Normalization vs denormalization. Design 2 schemas: an e-commerce system and an event tracking system. | Tuesday: Star schemas and SCDs. Design a fact/dimension model for a subscription business. Practice explaining trade-offs out loud. | Wednesday: Pipeline design. Study idempotency, orchestration, and error handling. Sketch 2 pipeline architectures on paper. | Thursday: Behavioral prep. Write 6 STAR stories covering: a production incident, a cross-team project, pushing back on a requirement, debugging under pressure, a project you led, a failure you learned from. | Friday: Mock interview 1. Full timed SQL round (3 problems in 45 minutes) + 1 pipeline design (30 minutes). | Saturday: Mock interview 2. Full timed round. Address weak spots from Friday. Review and revise behavioral stories. | Sunday: Rest day with light review only. Read through the STAR stories once and prioritize sleep over additional drilling.

8-Week Standard

For candidates with a technical background moving into data engineering. Targets 1.5 to 2 hours of daily practice on six days per week.

  1. 01

    Weeks 1-2: SQL Foundations

    Week 1 daily schedule: Mon/Wed/Fri = 5 new problems (SELECT, WHERE, ORDER BY, JOINs). Tue/Thu = review and redo problems you got wrong. Sat = 8 mixed problems timed. | Week 2 daily schedule: Mon/Wed/Fri = 5 JOIN problems (INNER, LEFT, FULL OUTER, CROSS, self-joins). Tue/Thu = aggregation practice (GROUP BY, HAVING, conditional logic). Sat = 8 mixed problems timed. | Target by end of week 2: You can write a 3-table JOIN with GROUP BY and HAVING from scratch in under 10 minutes. | Sunday: Rest day. No SQL practice.

  2. 02

    Weeks 3-4: Intermediate SQL

    Week 3 focus: Window functions only. Mon = ROW_NUMBER and RANK. Tue = DENSE_RANK and NTILE. Wed = LAG and LEAD. Thu = running totals and averages. Fri = frame clauses (ROWS BETWEEN). Sat = 8 window function problems timed. | Week 4 focus: CTEs, subqueries, NULLs. Mon/Tue = CTEs and recursive queries. Wed/Thu = scalar and correlated subqueries, EXISTS. Fri = COALESCE, NULLIF, three-valued logic. Sat = mixed intermediate problems. | Target by end of week 4: You can chain CTEs with window functions and handle NULLs correctly on the first try most of the time. | Sunday: Rest.

  3. 03

    Weeks 5-6: Python for Data Engineering

    Week 5 daily schedule: Mon = lists and list comprehensions (5 problems). Tue = dicts and dict operations (5 problems). Wed = sets and string manipulation. Thu = file I/O and JSON parsing. Fri = CSV processing and error handling. Sat = build a small ETL script from scratch. | Week 6 daily schedule: Mon/Tue = write functions that parse nested JSON and deduplicate records. Wed/Thu = write code that joins two datasets by key using only built-in Python. Fri = edge case practice (empty inputs, missing keys, type mismatches). Sat = timed Python problems. | Target by end of week 6: You can write a data transformation function in Python with error handling in under 20 minutes. | Sunday: Rest.

  4. 04

    Weeks 7-8: Modeling, Design, Behavioral, Mocks

    Week 7: Mon/Tue = data modeling (normalization, star schemas, SCDs). Design 4 schemas for different business domains. Wed/Thu = pipeline system design. Sketch 3 pipelines on paper with failure handling. Fri = behavioral prep. Write 8 STAR stories. Sat = rehearse stories, time each at under 3 minutes. | Week 8: Mon = mock SQL round (timed). Tue = mock pipeline design (timed). Wed = drill weak spots from mocks. Thu = second mock round. Fri = final review of all topics. Sat = full mock interview loop (SQL + design + behavioral). | Target by end of week 8: You can complete a full mock interview loop and feel confident in every round. | Sunday: Rest.

16-Week Thorough

For candidates starting from scratch. Targets 1 to 1.5 hours of daily practice on five weekdays with weekends off.

  1. 01

    Weeks 1-3: SQL Basics

    Week 1: SELECT, FROM, WHERE, ORDER BY, LIMIT. Understand what a relational database is. 3 problems per day, Mon-Fri. | Week 2: JOINs from first principles. Draw Venn diagrams. Solve 4 JOIN problems per day. Focus on understanding when LEFT JOIN produces NULLs. | Week 3: GROUP BY, HAVING, aggregate functions. Conditional aggregation with CASE WHEN inside SUM and COUNT. 4 problems per day. | Weekends: Rest. Optional review: rewrite one query from the week from memory and stop there.

  2. 02

    Weeks 4-6: Intermediate SQL

    Week 4: Subqueries (scalar, correlated, EXISTS). 3 problems per day. Practice rewriting subqueries as JOINs and back. | Week 5: Window functions. Spend the entire week here. Mon = ROW_NUMBER/RANK. Tue = LAG/LEAD. Wed = running totals. Thu = NTILE and percentiles. Fri = mixed window problems. | Week 6: CTEs, recursive queries, NULL handling, date functions. Round out your SQL toolkit. 4 problems per day covering all of these. | Weekends: Rest.

  3. 03

    Weeks 7-9: Python Fundamentals

    Week 7: Variables, types, control flow, functions. Write one small script per day (e.g., a function that validates email formats, a function that counts word frequencies). | Week 8: Lists, dicts, sets, comprehensions, string manipulation. 3 problems per day. | Week 9: File I/O, JSON, CSV, error handling with try/except. Build a small ETL script that reads a JSON file, transforms the data, and writes CSV output. | Weekends: Rest.

  4. 04

    Weeks 10-11: Python for Data Engineering

    Week 10: Working with APIs, HTTP requests, pagination, rate limit handling. Build a script that fetches paginated data from a public API. | Week 11: Testing, logging, and writing maintainable pipeline code. Add tests and logging to the scripts you wrote in weeks 9-10. | Weekends: Rest.

  5. 05

    Weeks 12-13: Data Modeling + Pipeline Design

    Week 12: Mon/Tue = 1NF through 3NF with examples. Wed/Thu = star schemas and fact/dimension tables. Fri = SCDs (Type 1, 2, 3). Design 2 schemas from scratch. | Week 13: Mon/Tue = pipeline architecture and orchestration basics. Wed = backfill strategies. Thu = schema evolution. Fri = design a complete pipeline on paper. | Weekends: Rest.

  6. 06

    Weeks 14-16: System Design + Behavioral + Full Mock Loops

    Week 14: System design deep practice. Design 5 pipelines end-to-end on paper (CDC ingestion, event streaming, daily batch ETL, reverse ETL, feature serving). Practice explaining each in 20 minutes. | Week 15: Behavioral prep. Write and rehearse 10 STAR stories. Record yourself delivering each one and review the recordings. Time each at under 3 minutes. | Week 16: Full mock interview loops. Mon = SQL mock. Tue = pipeline design mock. Wed = fix weak spots. Thu = behavioral mock. Fri = full end-to-end mock loop. Do at least 3 complete rounds this week. | Week 16 Saturday: Light review of your notes. Get sleep before the real thing.

How to Know When You Are Ready

SQL: 12 minutes

You can solve a medium-difficulty window function problem in under 12 minutes without looking anything up.

Pipeline design: 25 minutes

You can sketch a pipeline architecture on a whiteboard in 25 minutes that includes ingestion, transformation, loading, error handling, and monitoring.

Behavioral: 3 minutes

You can tell 5 different stories from memory, each under 3 minutes, covering collaboration, failure, debugging, trade-offs, and initiative.

Mock interviews: 2+

Completion of at least two timed mock interview rounds with feedback, and a passing verdict from the mock interviewer on the most recent attempt.

Common Study Plan Mistakes

Reading instead of writing. Tutorials and solution videos produce a feeling of progress that does not transfer to interview performance. A practical ratio is two hours of writing code for every hour of reading. A problem solved with the answer in view has not yet been learned.

Deferring system design until the final week. System design requires a different kind of preparation than SQL or Python. The skill is structuring a discussion, drawing the architecture, and reasoning about trade-offs while talking. It develops with weeks of practice, not days, so it should appear at least by the halfway point of any plan.

Skipping behavioral preparation. Behavioral rounds get treated as conversation rather than something to prepare for. The result tends to be rambling, vague examples, or recall failures under pressure. Stories should be written, rehearsed, and timed before the loop.

Spreading too thin across topics. Deep knowledge of window functions beats passing familiarity with twenty topics. Most interview rounds reward depth on the topics they test rather than breadth. The plan lists a focused topic set for a reason; adding more before mastering those tends to dilute the result.

Prepare for the interview
01 / Open invite
02min.

Know Study Plan the way the interviewer who asks it knows it.

a Study Plan query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1source → bronze → silver → gold
2 ingest : CDC + Kafka
3 transform : dbt + Airflow
4 serve : Snowflake
5
Execute your solution0.4s avg.
MicrosoftInterview question
Solve a Study Plan problem

Study Plan FAQ

Which study plan should I choose?+
If you have an interview scheduled within 3 weeks, use the 2-week sprint. If you have 2-3 months and some technical background, the 8-week standard is right. If you are new to data engineering or want to be thorough, the 16-week plan builds everything from the ground up. You can always start with the 8-week plan and extend it if you need more time on specific topics.
How many hours per day should I study?+
The 2-week sprint assumes 3-4 hours per day. The 8-week standard works with 1.5-2 hours per day. The 16-week thorough plan needs about 1-1.5 hours per day. Consistency matters more than volume. Studying 1 hour every day for 8 weeks beats cramming 8 hours on weekends.
Should I study SQL or Python first?+
SQL first, always. It is the single most-tested skill in data engineering interviews, and phone-screen SQL is the most common round type. If you are short on time, SQL proficiency gives you the highest return. The top SQL concepts by frequency: aggregation, JOINs, and window functions. Once your SQL is solid, layer in Python.
How do I know when I am ready for interviews?+
You are ready when you can solve a medium-difficulty SQL window function problem in under 12 minutes, design a basic pipeline architecture in 25 minutes with failure handling, and tell 3 behavioral stories without reading notes. If any of those feel shaky, drill that specific area for another week.
02 / Why practice

Begin with a SQL problem

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    System design is graded on the calls you defend out loud

    Ingestion, batch vs streaming, the bronze/silver/gold layers, idempotency, backfill and replay. Sketching the pipeline and naming the failure modes is the signal, not the boxes

Related Guides