You're probably looking at this page because someone told you data engineering has a hundred tools and you don't know where to start. Here's the reassuring truth: you don't need most of them. Four stages, sequenced in the order interviewers test you, with milestones at the end of each. Work through them in order and you'll know exactly how far you've come every Sunday night.
Looking for the detailed 18-week study plan with specific weekly topics? See the full Data Engineering Roadmap.
Stages to interview-ready
SQL + Python share
Data modeling share
L5 senior rounds
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
Learn the skills that get you hired
Most data engineer job postings list SQL and Python as the top two requirements. Not Spark. Not Kubernetes. Not Terraform. SQL and Python. This stage focuses entirely on building fluency in these two skills because they are what interviewers actually test.
SQL
SELECT, JOINs, GROUP BY, window functions, CTEs, subqueries, NULL handling. SQL is the single most-tested skill in DE interviews. Expect it in every loop.
Python
Data structures, string processing, file I/O, error handling, functions, and basic OOP. Python rounds test practical coding ability, not LeetCode algorithms.
Command Line Basics
Navigating filesystems, piping output, grep, awk, cron. You will use these daily. They also show up in system design discussions.
Milestone
You can solve a 3-step SQL problem and a Python file-processing task in under 15 minutes each, without referencing documentation.
Design schemas that work at scale
About a third of DE interview loops include a data modeling round. Candidates who skip this stage fail that round. Data modeling is not optional. It also makes you significantly better at writing SQL because you understand why tables are structured the way they are.
Normalization
1NF through 3NF, when to denormalize, and the trade-offs of each approach. Interviewers want you to reason about these trade-offs, not recite definitions.
Dimensional Modeling
Star and snowflake schemas, fact tables vs dimension tables, slowly changing dimensions (SCD Type 1, 2, 3). This is the foundation of analytics data warehousing.
Schema Design Practice
Given a business scenario, design a normalized schema, explain your choices, and discuss trade-offs. Practice whiteboarding this in 20-30 minutes.
Milestone
Given a business scenario (e.g., 'design the schema for a ride-sharing app'), you can produce a clean ER diagram and defend your choices in a 30-minute interview round.
Build and reason about production systems
System design questions carry outsized weight when they appear, even though they make up a small fraction of total interview rounds. This stage covers the concepts interviewers test: batch vs streaming, ETL vs ELT, idempotency, schema evolution, and orchestration.
Batch vs Streaming
When to use each, cost trade-offs, tooling (Airflow, Spark, Flink, Kafka). The most common interview mistake is proposing streaming when batch is sufficient.
Orchestration
DAGs, dependencies, retries, idempotency. Airflow is the most common tool but interviewers care more about the concepts than the specific tool.
Data Quality
Validation, monitoring, alerting, SLAs. Production pipelines need automated checks. Interviewers test whether you think about data quality proactively.
Milestone
You can whiteboard a data pipeline for a given business requirement, name specific tools, and explain why you made each design choice. You default to the simplest solution that meets the latency requirement.
Convert knowledge into offers
Knowing the material and performing in interviews are different skills. This stage bridges the gap with timed practice, mock interviews, and behavioral preparation. Most candidates underinvest here and lose offers they should have won.
Timed SQL Practice
5 questions in 60 minutes. Build speed and accuracy under pressure. Review mistakes after every session.
Mock Interviews
Full loop simulations: SQL round, Python round, system design round, behavioral round. Practice with another person if possible.
Behavioral Questions
Conflict resolution, project ownership, failure stories. Use the STAR format. Prepare 5-6 stories that cover different scenarios.
Milestone
You can complete a full mock interview loop and pass all four rounds. Your SQL solutions are correct on the first try most of the time.
We've watched people lose three months to the wrong resource, the wrong project, or the wrong tool. You're going to make one of these mistakes at some point. The goal is to catch yourself quickly and course-correct. Here are the ones that trip up almost everyone.
This page covers the high-level career stages and milestones. For a week-by-week plan with specific topics, skills to practice, and progress checks, see the full 18-week Data Engineering Roadmap.
View the 18-Week Roadmap →Stage 1 is SQL. Pick one problem tonight and you've started. Everyone you see at L5 did exactly this.