Data Engineering Study Plan (2, 8, or 16 Weeks)
Study time should be allocated in proportion to where interviews concentrate. From the verified rounds in the dataset, 32.7% are phone-screen SQL, 20.7% are technical screens, and 11.7% are onsite SQL, which puts SQL alone above 44% of all rounds. The plans below start with SQL, then layer Python, data modeling, and system design in that order.
2-Week Sprint
For an interview scheduled within two weeks. Front-loads SQL, then covers data modeling, pipeline design, and behavioral. Targets 3 to 4 hours of daily practice.
- 01
Week 1: SQL Intensive
Monday: Window functions. Solve 5 ROW_NUMBER and RANK problems. Time yourself at 15 minutes each. Review frame clauses (ROWS BETWEEN). | Tuesday: Window functions continued. 5 problems using LAG, LEAD, and running totals. Practice writing PARTITION BY clauses from scratch. | Wednesday: JOINs. 5 problems mixing INNER and LEFT JOIN. Focus on NULL behavior in LEFT JOINs. Then 3 self-join problems. | Thursday: Aggregation. GROUP BY with HAVING, conditional aggregation using CASE WHEN inside SUM/COUNT. 5 problems. | Friday: CTEs and subqueries. Chain 3+ CTEs in a single query. Practice correlated subqueries and EXISTS. 5 problems. | Saturday: Mixed SQL. 8 problems across all topics, timed. Simulate interview conditions: no looking up syntax. | Sunday: Rest day. Review your wrong answers from the week. Rewrite the hardest query from memory.
- 02
Week 2: Modeling, Design, Behavioral
Monday: Data modeling. Normalization vs denormalization. Design 2 schemas: an e-commerce system and an event tracking system. | Tuesday: Star schemas and SCDs. Design a fact/dimension model for a subscription business. Practice explaining trade-offs out loud. | Wednesday: Pipeline design. Study idempotency, orchestration, and error handling. Sketch 2 pipeline architectures on paper. | Thursday: Behavioral prep. Write 6 STAR stories covering: a production incident, a cross-team project, pushing back on a requirement, debugging under pressure, a project you led, a failure you learned from. | Friday: Mock interview 1. Full timed SQL round (3 problems in 45 minutes) + 1 pipeline design (30 minutes). | Saturday: Mock interview 2. Full timed round. Address weak spots from Friday. Review and revise behavioral stories. | Sunday: Rest day with light review only. Read through the STAR stories once and prioritize sleep over additional drilling.
8-Week Standard
For candidates with a technical background moving into data engineering. Targets 1.5 to 2 hours of daily practice on six days per week.
- 01
Weeks 1-2: SQL Foundations
Week 1 daily schedule: Mon/Wed/Fri = 5 new problems (SELECT, WHERE, ORDER BY, JOINs). Tue/Thu = review and redo problems you got wrong. Sat = 8 mixed problems timed. | Week 2 daily schedule: Mon/Wed/Fri = 5 JOIN problems (INNER, LEFT, FULL OUTER, CROSS, self-joins). Tue/Thu = aggregation practice (GROUP BY, HAVING, conditional logic). Sat = 8 mixed problems timed. | Target by end of week 2: You can write a 3-table JOIN with GROUP BY and HAVING from scratch in under 10 minutes. | Sunday: Rest day. No SQL practice.
- 02
Weeks 3-4: Intermediate SQL
Week 3 focus: Window functions only. Mon = ROW_NUMBER and RANK. Tue = DENSE_RANK and NTILE. Wed = LAG and LEAD. Thu = running totals and averages. Fri = frame clauses (ROWS BETWEEN). Sat = 8 window function problems timed. | Week 4 focus: CTEs, subqueries, NULLs. Mon/Tue = CTEs and recursive queries. Wed/Thu = scalar and correlated subqueries, EXISTS. Fri = COALESCE, NULLIF, three-valued logic. Sat = mixed intermediate problems. | Target by end of week 4: You can chain CTEs with window functions and handle NULLs correctly on the first try most of the time. | Sunday: Rest.
- 03
Weeks 5-6: Python for Data Engineering
Week 5 daily schedule: Mon = lists and list comprehensions (5 problems). Tue = dicts and dict operations (5 problems). Wed = sets and string manipulation. Thu = file I/O and JSON parsing. Fri = CSV processing and error handling. Sat = build a small ETL script from scratch. | Week 6 daily schedule: Mon/Tue = write functions that parse nested JSON and deduplicate records. Wed/Thu = write code that joins two datasets by key using only built-in Python. Fri = edge case practice (empty inputs, missing keys, type mismatches). Sat = timed Python problems. | Target by end of week 6: You can write a data transformation function in Python with error handling in under 20 minutes. | Sunday: Rest.
- 04
Weeks 7-8: Modeling, Design, Behavioral, Mocks
Week 7: Mon/Tue = data modeling (normalization, star schemas, SCDs). Design 4 schemas for different business domains. Wed/Thu = pipeline system design. Sketch 3 pipelines on paper with failure handling. Fri = behavioral prep. Write 8 STAR stories. Sat = rehearse stories, time each at under 3 minutes. | Week 8: Mon = mock SQL round (timed). Tue = mock pipeline design (timed). Wed = drill weak spots from mocks. Thu = second mock round. Fri = final review of all topics. Sat = full mock interview loop (SQL + design + behavioral). | Target by end of week 8: You can complete a full mock interview loop and feel confident in every round. | Sunday: Rest.
16-Week Thorough
For candidates starting from scratch. Targets 1 to 1.5 hours of daily practice on five weekdays with weekends off.
- 01
Weeks 1-3: SQL Basics
Week 1: SELECT, FROM, WHERE, ORDER BY, LIMIT. Understand what a relational database is. 3 problems per day, Mon-Fri. | Week 2: JOINs from first principles. Draw Venn diagrams. Solve 4 JOIN problems per day. Focus on understanding when LEFT JOIN produces NULLs. | Week 3: GROUP BY, HAVING, aggregate functions. Conditional aggregation with CASE WHEN inside SUM and COUNT. 4 problems per day. | Weekends: Rest. Optional review: rewrite one query from the week from memory and stop there.
- 02
Weeks 4-6: Intermediate SQL
Week 4: Subqueries (scalar, correlated, EXISTS). 3 problems per day. Practice rewriting subqueries as JOINs and back. | Week 5: Window functions. Spend the entire week here. Mon = ROW_NUMBER/RANK. Tue = LAG/LEAD. Wed = running totals. Thu = NTILE and percentiles. Fri = mixed window problems. | Week 6: CTEs, recursive queries, NULL handling, date functions. Round out your SQL toolkit. 4 problems per day covering all of these. | Weekends: Rest.
- 03
Weeks 7-9: Python Fundamentals
Week 7: Variables, types, control flow, functions. Write one small script per day (e.g., a function that validates email formats, a function that counts word frequencies). | Week 8: Lists, dicts, sets, comprehensions, string manipulation. 3 problems per day. | Week 9: File I/O, JSON, CSV, error handling with try/except. Build a small ETL script that reads a JSON file, transforms the data, and writes CSV output. | Weekends: Rest.
- 04
Weeks 10-11: Python for Data Engineering
Week 10: Working with APIs, HTTP requests, pagination, rate limit handling. Build a script that fetches paginated data from a public API. | Week 11: Testing, logging, and writing maintainable pipeline code. Add tests and logging to the scripts you wrote in weeks 9-10. | Weekends: Rest.
- 05
Weeks 12-13: Data Modeling + Pipeline Design
Week 12: Mon/Tue = 1NF through 3NF with examples. Wed/Thu = star schemas and fact/dimension tables. Fri = SCDs (Type 1, 2, 3). Design 2 schemas from scratch. | Week 13: Mon/Tue = pipeline architecture and orchestration basics. Wed = backfill strategies. Thu = schema evolution. Fri = design a complete pipeline on paper. | Weekends: Rest.
- 06
Weeks 14-16: System Design + Behavioral + Full Mock Loops
Week 14: System design deep practice. Design 5 pipelines end-to-end on paper (CDC ingestion, event streaming, daily batch ETL, reverse ETL, feature serving). Practice explaining each in 20 minutes. | Week 15: Behavioral prep. Write and rehearse 10 STAR stories. Record yourself delivering each one and review the recordings. Time each at under 3 minutes. | Week 16: Full mock interview loops. Mon = SQL mock. Tue = pipeline design mock. Wed = fix weak spots. Thu = behavioral mock. Fri = full end-to-end mock loop. Do at least 3 complete rounds this week. | Week 16 Saturday: Light review of your notes. Get sleep before the real thing.
How to Know When You Are Ready
SQL: 12 minutes
You can solve a medium-difficulty window function problem in under 12 minutes without looking anything up.
Pipeline design: 25 minutes
You can sketch a pipeline architecture on a whiteboard in 25 minutes that includes ingestion, transformation, loading, error handling, and monitoring.
Behavioral: 3 minutes
You can tell 5 different stories from memory, each under 3 minutes, covering collaboration, failure, debugging, trade-offs, and initiative.
Mock interviews: 2+
Completion of at least two timed mock interview rounds with feedback, and a passing verdict from the mock interviewer on the most recent attempt.
Common Study Plan Mistakes
Reading instead of writing. Tutorials and solution videos produce a feeling of progress that does not transfer to interview performance. A practical ratio is two hours of writing code for every hour of reading. A problem solved with the answer in view has not yet been learned.
Deferring system design until the final week. System design requires a different kind of preparation than SQL or Python. The skill is structuring a discussion, drawing the architecture, and reasoning about trade-offs while talking. It develops with weeks of practice, not days, so it should appear at least by the halfway point of any plan.
Skipping behavioral preparation. Behavioral rounds get treated as conversation rather than something to prepare for. The result tends to be rambling, vague examples, or recall failures under pressure. Stories should be written, rehearsed, and timed before the loop.
Spreading too thin across topics. Deep knowledge of window functions beats passing familiarity with twenty topics. Most interview rounds reward depth on the topics they test rather than breadth. The plan lists a focused topic set for a reason; adding more before mastering those tends to dilute the result.
Know Study Plan the way the interviewer who asks it knows it.
Study Plan FAQ
Which study plan should I choose?+
How many hours per day should I study?+
Should I study SQL or Python first?+
How do I know when I am ready for interviews?+
Begin with a SQL problem
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
System design is graded on the calls you defend out loud
Ingestion, batch vs streaming, the bronze/silver/gold layers, idempotency, backfill and replay. Sketching the pipeline and naming the failure modes is the signal, not the boxes