Data Engineering Interview Practice

A data engineering interview loop is usually 5 rounds across 2 days: SQL screen, then SQL + Python + modeling + design + behavioral as an onsite block, then a hiring manager call. Most prep covers 1 round well and the others poorly. The full-loop simulator runs every round, with the rubric calibrated by company. Pick a target company, pick a seniority level, walk the full loop.

A data engineering interview loop is usually 5 rounds across 2 days: SQL screen, then SQL + Python + modeling + design + behavioral as an onsite block, then a hiring manager call. Most prep covers 1 round well and the others poorly. The full-loop simulator runs every round, with the rubric calibrated by company. Pick a target company, pick a seniority level, walk the full loop.

Prepare for the interview
01 / Open invite
02min.

Know the patterns before the interviewer asks them.

a SQL query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1SELECT user_id,
2 COUNT(*) AS sessions
3FROM events
4WHERE ts >= NOW() - INTERVAL '7 day'
5
Execute your solution0.4s avg.
MicrosoftInterview question
Solve a problem
5
Rounds in a standard DE loop
13
Company-specific tracks
1,317
Problems across surfaces
Unlimited
Free mock loops

Anatomy of a standard senior DE loop

Frequency is share of senior loops that include the round; surface is the page on this site that practices it.

Standard senior DE loop, with frequency and surface for each round
  RECRUITER SCREEN                       30 min
    ────────────────────────────────────
    behavioral, why this company           ─► /behavioral-interview-questions
    salary expectations
    next step + timeline

       │ pass
       ▼

  HIRING MANAGER SCREEN                  45 min
    ────────────────────────────────────
    domain experience, project depth       95% of loops
    fit with team's tech stack

       │ pass
       ▼

  ┌─── ONSITE BLOCK (2 days or single day) ───┐
  │                                            │
  │  SQL TECHNICAL          45 min   95%  ─► /sql-practice-online        │
  │  PYTHON TECHNICAL       45 min   78%  ─► /python-coding-practice     │
  │  DATA MODELING          45 min   65%  ─► /data-modeling-interview-practice │
  │  SYSTEM DESIGN          60 min   52%  ─► /system-design-interview-practice │
  │  BEHAVIORAL             45 min  100%  ─► /behavioral-interview-questions   │
  │                                            │
  └────────────────────────────────────────────┘

       │ debrief, leveling
       ▼

  OFFER OR REJECT

Company-specific tracks

What each named company tests, drawn from verified interview write-ups. Full company guides at /companies have more detail.

Meta10-week prep
FocusSQL (window-heavy), product analytics, behavioral (metric ownership stories)
LoopSQL screen, SQL onsite, Python, modeling, behavioral, sometimes pipeline design
Amazon10-week prep
FocusSQL, Python (ETL flow control), pipeline design at L5+, Leadership Principles in behavioral
LoopSQL screen, SQL+Python onsite, design, behavioral with LP probes, bar raiser
Google11-week prep
FocusSQL, Python, distributed systems thinking at staff+, Googleyness in behavioral
LoopSQL screen, coding, system design, modeling-lite, Googleyness round
Netflix11-week prep
FocusPySpark deep, lakehouse modeling, senior bar across rounds, culture deck in behavioral
LoopPySpark screen, Spark onsite, design, modeling, behavioral
Stripe8-week prep
FocusSQL, idempotent Python, payments-flavored design, depth in behavioral
LoopSQL screen, Python coding, design (payments-shaped), modeling, behavioral
Databricks11-week prep
FocusPySpark deep, Delta Lake modeling, performance debugging, lakehouse design
LoopPySpark screen, Spark onsite (incl. performance), modeling, design, behavioral
Snowflake9-week prep
FocusSQL incl. Snowflake-specific, modeling, system design with warehouse constraints
LoopSQL screen, SQL onsite, modeling, design, behavioral

6-week prep allocation by surface

Problem volume per surface per week. The stack composition shifts as you move from foundations to mocks.

Week 1-2SQL foundations
8-10/wk
Week 3SQL windows + Python patterns
10-12/wk
Week 4PySpark (if relevant) + modeling
11-13/wk
Week 5Design canvas + behavioral stories
12-14/wk
Week 6Mocks + weak spots
10-12/wk
SQLPythonPySparkModelingDesignBehavioralMocks

Behavioral stories every DE candidate needs

6 themes that map to most DE behavioral prompts. Each theme needs 1-2 STAR-format stories with concrete numbers.

ThemeWhat the interviewer is listening for
Scoped a project under ambiguitySpecific business question, the data you had vs needed, the call you made, the outcome.
Disagreed with a senior engineer or PMWhat the disagreement was, how you escalated, the resolution, what you'd do differently.
Recovered from a production incidentWhat broke, your role in detection vs fix, the root cause, the post-mortem item.
Pushed back on scopeWhat was asked, why it didn't fit, what you proposed instead, how it landed.
Built or led a hiring processWhat you screened for, a tradeoff you made, the result, your evolution as a hiring lead.
Owned a metric that movedThe metric, the baseline, what you changed, the measured impact, the second-order effect you noticed.

DE interview practice FAQ

What's the difference between this and solo problem practice?+
Solo builds unit skills (write a query, write a function). The loop simulator builds integration: working through a vague prompt, getting interrupted, defending tradeoffs, recovering from a wrong turn. Real interviews test integration; the loop simulator practices it.
Do you cover the data modeling round?+
Yes. Most other DE prep sites are weak on modeling because no interactive practice surface exists. The modeling canvas lets you design a star schema, define grains, pick SCD types, and submit. Rubric scores against dimensional modeling best practices and pushes on the choices interviewers push on (late-arriving facts, attribute change handling, denormalization tradeoffs).
Are there company-specific tracks?+
13 companies have detailed tracks (Meta, Amazon, Google, Netflix, Apple, Microsoft, Stripe, Uber, Spotify, Airbnb, Databricks, Snowflake, LinkedIn). Each names the rounds, the difficulty distribution, the recurring patterns, the prep timeline. Calibrate the AI mock interviewer to a specific company in the last 2-3 weeks.
What if I'm transitioning from analyst or software engineer?+
The standard 6-week plan still applies. Add 2-3 weeks on the round that was weakest in your prior role. Analysts need more on Python pipeline patterns and system design. SWEs need more on data modeling and warehouse-specific SQL. Both transitions usually need less LeetCode prep than the candidate expects.
How is the catalog calibrated to real interviews?+
Every problem traces back to at least 1 verified interview write-up. The difficulty distribution and topic mix match the interview frequency distribution. The numbers are visible at /metrics for verification.
What about take-home assignments?+
The take-home mode covers them. Longer timer (2-6 hours), single integrated prompt requiring schema design + SQL + Python loader + README defending the choices. Common at Stripe, Snowflake, Databricks, some Meta DE roles. The verdict scores all 4 pieces.
How much prep do I actually need?+
Depends on starting point. Currently in a mid-tier DE role, targeting FAANG senior: 8-12 weeks at 10-15 hr/wk. Transitioning from analyst to mid-level DE: 10-16 weeks. Senior at FAANG-tier targeting staff at another FAANG: 6-10 weeks focused on the round shapes the new company emphasizes. Most candidates underestimate by 30-50%.
02 / Why practice

Start with the SQL screen mock

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Related practice