Data Engineering Interview Prep

Data engineering interview prep is the process of practicing the five rounds the loop tests: SQL, Python, data modeling, system design, and behavioral. The loop runs 5 to 7 rounds across roughly 4 to 8 weeks of focused prep.

A data engineering loop runs five domains in some order: SQL, Python, data modeling, pipeline system design, behavioral. The domains are stable across companies and levels. What changes is how much judgment the room expects you to show on top of the working answer, and that changes a lot between L4 and L6. This guide is the 2026 read on the loop, built from 2,817 verified interview reports across 920 companies. If you're two weeks out, skip to the SQL round and the modeling round; those carry the most weight.

Prepare for the interview

01 / Open invite

02min.

Know the patterns before the interviewer asks them.

a system design query, the same shape a screen would give you.

The diff against expected. Where ties broke. What you missed.

sandbox

1source → bronze → silver → gold

2 ingest : CDC + Kafka

3 transform : dbt + Airflow

4 serve : Snowflake

Execute your solution0.4s avg.

PayPalInterview question

Solve a problem

The five rounds, in the order they actually run

SQL round. 45 to 60 minutes, live coding, one or two prompts that look like analytics questions but test window functions, anti-joins, and conditional aggregation. Rejections here are about speed and clarity, not correctness; most candidates eventually get to a right answer, but the ones who hire are done in fifteen minutes with no dead ends. The trap is window-function ordering: a ties-allowed ORDER BY inside an OVER clause changes the answer in ways that are hard to spot under time pressure. Full walkthrough in the SQL round prep guide.

Python round. Closer to a notebook than to LeetCode. Pandas groupby/merge/pivot, parsing semi-structured input (gzipped JSON, malformed CSV, fields-of-fields), small class design, and increasingly often a PySpark variant. The bar is not whether you can write Python; it is whether you can write the kind of Python a data engineer writes on the job. Companies with heavy Spark stacks bias toward PySpark on this round, covered in the Python round breakdown.

Data modeling round. The round that decides most loops, and the one candidates underprepare for. The interviewer gives you a product (ride-share, streaming app, e-commerce) and asks for the warehouse schema. Passing answers state the grain before drawing tables, name the slowly changing dimension type (Type 2 is the common one, Type 6 is the discriminator), defend a tradeoff between star and data vault, and call out at least one denormalization decision with a reason. The two losing patterns: skipping the grain statement and over-normalizing. Detail in the data modeling round guide.

System design round. Pipelines, not services. Prompts cluster around three families: near-real-time fraud detection, daily reporting aggregations, user event sessionization. Passing answers choose batch or streaming explicitly, name the orchestrator (Airflow if you don't have a stronger opinion), address late-arriving data, define a backfill strategy, and surface at least two failure modes (partial writes, schema drift, dedup, exactly-once semantics). The senior-vs-mid signal here is whether you bring up the failure modes before the interviewer prompts you. Walked through in the system design round guide.

Behavioral round. STAR format is table stakes. What separates strong answers is whether you distinguish what you did from what the team did, and whether you can name the tradeoff you made rather than the outcome you got. Senior loops add prompts about scope under uncertainty and outcomes you owned across teams; staff loops add prompts about decisions that played out over quarters. The most common failure is rambling, the second most common is burying the result. Stories worth practicing live in the behavioral round guide.

You're ready for the loop when

One check per round. If any of these is false two weeks out, that round is where your prep hours go.

SQL. You can write a windowed dedup with a composite tiebreaker in under 12 minutes, out loud, without looking up syntax.
Python. You can parse a malformed CSV, dedup by composite key, and explain why your generator version holds memory constant.
Modeling. You state the grain before drawing any table, and you can defend Type 2 versus Type 1 for a specific column without hedging.
System design. You name late-arriving data, backfill strategy, and two failure modes before the interviewer prompts you.
Behavioral. You have six STAR stories where what you did is distinct from what the team did, each under three minutes.

What changes by level

Same five rounds. Different bar for what counts as a passing answer. The shift from working answer to reasoned answer is sharp between L4 and L5. Comp bands are the typical company 25th to 75th percentile from the verified offer samples behind this site's company pages.

Level	What they're testing	Round emphasis	Common failure	US total comp (2026)
Junior / L3	Working answer to a scoped problem	SQL heavy, modeling light, no design round	Schema vocabulary gaps	$114K to $192K
Mid / L4	Fluency across all five domains	Even weights, modeling and design at conceptual depth	Skipping the grain statement in modeling	$150K to $238K
Senior / L5	Judgment and tradeoff articulation	Modeling and design carry the loop	Not leading the design conversation	$201K to $347K
Staff / L6	Scope of impact and decision documentation	Two design rounds, one cross-org behavioral	Treating design like an L5 deep dive instead of a roadmap	$317K to $469K+

Role-specific bars sit on top of these. Analytics engineering loops add dbt and modeling depth at the expense of pipeline design. ML data engineering loops push feature-store design and online/offline parity. Cloud specializations (AWS, GCP, Azure) substitute one of the design rounds for a cloud-native pipeline build using that provider's primitives.

Analysts Are Slowing the Store Down

> We run an e-commerce marketplace where the analytics team queries the production database directly, and that load is degrading the live application. Move analytics onto its own warehouse by reading the database's change log instead of querying the live system, while a merchant-facing dashboard still shows each seller their new orders within fifteen minutes on a path of its own. A small fraction of orders arrive with broken merchant references or totals that do not add up, so those have to be held back and caught before they reach the reporting tables.

+ Source

+ Transform

+ Storage

+ Quality

+ Consumer

+ Queue

Bronze

Silver

Gold

Custom

Pipeline Architecture

Sketch the architecture.

Click or drag a node from the toolbar above. Right-click the canvas for the full menu.

Drag from a node's right port to another node's left port to wire data flow.

Companies whose loops are actually different

Most loops follow the standard five-round template. Five companies bend it enough to be worth knowing in advance. Netflix and Airbnb skew streaming and large-scale event processing; expect a Spark or Flink question where most other companies would ask SQL. Stripe pushes system design depth, with a second design round in place of a Python round for senior candidates. Databricks tests modeling on Delta Lake and PySpark specifics that no other loop covers. Uber runs the heaviest data modeling round in the industry, with two-hour onsites that include a live schema critique. If you're interviewing at any of these five, weight your prep accordingly.

For everyone else, the standard five-round loop is the right mental model. Reference the U.S. Bureau of Labor Statistics data engineer occupation page for level definitions and median compensation by metro.

Company-by-company: where each loop leans

From the interview reports behind this site's per-company guides. 'Standard' means the five-round template with no unusual weighting.

Company	Loop shape	What it leans on
Stripe	Second design round replaces Python at senior	Idempotent pipelines, system design depth
Uber	Heaviest modeling round in the industry	Two-hour onsite with live schema critique
Airbnb	Standard + streaming emphasis	Spark, large-scale event processing
Databricks	Spark-native loop	Delta Lake modeling, PySpark internals
Snowflake	Standard, dialect-deep SQL	QUALIFY, FLATTEN, time travel, warehouse design
Netflix	Standard + streaming emphasis	Spark or Flink question replaces one SQL round
Lyft	Standard	Marketplace metrics SQL, geo event modeling
DoorDash	Standard	Logistics modeling, real-time dispatch design
Instacart	Standard	Catalog and inventory modeling, window functions
Robinhood	Standard + correctness emphasis	Exactly-once semantics, financial reconciliation
Pinterest	Standard	Engagement event sessionization
Meta	SQL-heaviest FAANG loop	Window functions, gap-and-island, Presto dialect

Interview guides by company

Round-by-round breakdowns with the questions reported by candidates, the stack they actually use, and where the loop deviates from the standard template.

Stripe data engineer interview guide→

Stripe Data Engineer process, comp, financial-precision SQL, and the collaboration round.

Uber data engineer interview guide→

Uber Data Engineer process, marketplace and surge data modeling, geospatial pipelines.

Airbnb data engineer interview guide→

Airbnb Data Engineer process, experimentation platform questions, two-sided marketplace modeling.

Databricks data engineer interview guide→

Databricks Data Engineer process, Spark internals, lakehouse architecture, Delta Lake questions.

Snowflake data engineer interview guide→

Snowflake Data Engineer process, micro-partitions, query optimization, warehouse architecture.

Netflix data engineer interview guide→

Netflix Data Engineer process, streaming pipelines, A/B test infra, and the keeper test.

Lyft data engineer interview guide→

Lyft Data Engineer process, marketplace pricing pipelines, real-time matching data.

DoorDash data engineer interview guide→

DoorDash Data Engineer process, three-sided marketplace data, dasher-merchant-consumer modeling.

Instacart data engineer interview guide→

Instacart Data Engineer process, retailer catalog modeling, batch and real-time inventory.

Robinhood data engineer interview guide→

Robinhood Data Engineer process, trading data, regulatory pipelines, audit-trail modeling.

Pinterest data engineer interview guide→

Pinterest Data Engineer process, recommendation pipelines, ad attribution data, graph modeling.

Twitter data engineer interview guide→

Twitter (X) Data Engineer process, real-time timeline data, social graph modeling at scale.

Which tools actually show up

The job description and the interview rarely match. dbt is on every JD; dbt rarely shows up in the loop unless the company is dbt-native (Wayfair, HubSpot, dbt Labs). Airflow is the same: conceptual questions about DAG design come up, code questions almost never do. Spark is the inverse: rarely on the JD as a requirement, frequently in the design round as a tradeoff conversation. See the dbt vs Airflow comparison for which one to spend time on.

On warehouses, the loop is dialect-agnostic at most companies: ANSI SQL with Postgres syntax for the edge cases. The exceptions are the warehouse vendors themselves and the companies built on a specific stack. Snowflake-native shops ask Snowflake-specific syntax (QUALIFY, FLATTEN, time travel); BigQuery shops do the same for ARRAY functions and partition decorators. Snowflake vs Databricks covers which dialect to invest in if you're choosing.

For streaming, Kafka comes up in design rounds at any company handling real-time data, but only Stripe, Netflix, Databricks, and the ad-tech mid-market ask Kafka questions deep enough to require API-level familiarity. Flink is asked at maybe five companies in the industry. If you're not interviewing at one of them, the time-to-payoff on learning Flink is negative. See Kafka vs Kinesis for the AWS-context version of the same call.

How to spend your prep time

Two weeks out. SQL drills and one mock per day. Skip the take-home if it's not required. Don't start a new tool; you won't get to working fluency. Hit the top 100 questions for breadth, then do four to six full mock interviews out loud, alone or with a peer.

Eight weeks out. Week one and two: SQL and Python to fluency. Week three through five: data modeling ten problems deep, with the schemas drawn out and defended. Week six and seven: system design, fifteen prompts minimum, out loud. Week eight: mocks and behavioral rehearsal. Each round has its own detailed walkthrough in the per-round guides.

Switching from software engineering. Your algorithm prep is over-leveled for this loop. The deltas to close are dimensional modeling and pipeline design. See data engineer vs backend engineer for which of your existing skills carry over. SQL vs Python covers which to deepen first.

Switching from analytics or analytics engineering. Your SQL is already strong. The unfamiliar territory is pipeline orchestration, late-arriving data, and the failure modes that come from running schedules instead of writing queries. See data engineer vs analytics engineer for the specific gap to close.

A walked-through modeling question

Prompt. "Design the warehouse schema for a ride-share app. The product team wants daily reporting on completed rides, driver utilization, and surge pricing effectiveness."

The losing answer. A candidate draws three tables: rides, drivers, surge_events, normalizes each to 3NF, and answers the three reporting questions with joins. They never stated the grain. They built for the questions instead of the business. When the interviewer adds a question ("now report on rider lifetime value"), the schema cracks because it has no rider dimension.

The hire-signal answer. "I'm modeling a fact table at the completed-ride grain with one row per ride, and three Type 2 dimensions: dim_driver, dim_rider, dim_geo. Surge is an attribute on the ride fact, not its own table, because surge state at ride start is what reporting cares about. Driver utilization is a derived metric from the ride fact plus a snapshot of driver shifts. I'm choosing star over snowflake here because the reporting tool is BI and joins are cheap; if the consumer were operational, I'd reconsider." The candidate stated the grain, defended the SCD type, named the shape, and surfaced one tradeoff with a reason. That's the rubric.

The full walkthrough, with the follow-up questions the interviewer asks next and the variants that show up at Uber and DoorDash, is in the data modeling round guide. For the design-round version of this same exercise, see the system design walkthrough.

Ten questions from real loops, answered the way interviewers grade them

Two per round, condensed to the passing shape. The full worked versions, with follow-ups and the common wrong answers, live in the per-round guides and the practice catalog.

SQL

Return each customer's second-highest order total. Two customers tie at the top; handle it.

DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_total DESC) in a CTE, filter rank = 2 outside. The grading point is the tie handling: ROW_NUMBER forces an arbitrary winner, RANK skips 2 entirely when two rows tie at 1, DENSE_RANK keeps a true second. Say which you chose and why before the interviewer asks.

SQL

Find users who were active seven or more consecutive days last month.

Gap-and-island: activity_date minus ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY activity_date) * INTERVAL '1 day' is constant across a consecutive run. GROUP BY the user and that derived key, HAVING COUNT(*) >= 7. Dedup multiple events per day with DISTINCT first or the row numbers break; volunteering that dedup step is the senior signal.

Python

Count events per user from a 50GB gzipped JSONL file on a machine with 8GB of memory.

Stream it: gzip.open in text mode, iterate line by line, json.loads each line, accumulate counts in a defaultdict(int). Memory holds the counter, never the file. Mention the malformed-line policy (count and skip to a dead-letter log, never crash) and you have covered what the hidden tests actually check.

Python

Deduplicate an event stream by event_id, keeping the record with the latest updated_at.

Dict keyed on event_id; replace when the incoming updated_at is strictly newer, with a deterministic tiebreaker (say, larger sequence number) for equal timestamps. This is ROW_NUMBER dedup translated to Python, and the interviewer is checking the same two things: the tiebreaker and what happens on ties.

Data modeling

Design the warehouse schema for a food-delivery app that needs daily reporting on orders, courier utilization, and promo effectiveness.

State the grain first: one row per delivered order in fact_orders. Type 2 dimensions for courier, customer, and restaurant; promo as an attribute on the order fact because reporting cares about promo state at order time. Courier utilization derives from the fact plus a shift snapshot table. Then name one tradeoff out loud: star over snowflake because the consumer is BI.

Data modeling

When do you actually need a Type 2 slowly changing dimension, and what does it cost?

Type 2 when history must be reported as it was (a customer's segment at order time), Type 1 when only current state matters (a corrected email). The cost is row explosion, half-open effective_from/effective_to join logic on every fact join, and a merge pipeline that must be idempotent. Saying 'Type 2 everywhere to be safe' is a failing answer; it doubles storage and every join for history nobody asked for.

System design

Design the pipeline for near-real-time fraud alerts on payment events.

Choose streaming explicitly and say why batch loses (minutes matter). Kafka or Kinesis into a stateful consumer with sliding-window aggregates per account, alerts to a low-latency store, raw events landed in parallel to the warehouse for backfill and model training. Name the failure modes unprompted: late events versus watermark, duplicate delivery handled by idempotent alert keys, and what degrades when the consumer lags.

System design

Your daily batch job double-writes when it is retried. Make re-runs safe.

Make the job idempotent: MERGE on the natural key instead of INSERT, or write to a staging table and atomically swap the partition. The principle to say out loud: re-running yesterday today must produce the same result. Add a run_id for audit and alert on row-count drift rather than on job success alone.

Behavioral

Tell me about a time a pipeline you owned failed in production.

Pick a failure with a real blast radius, and structure it: what broke, what you did in the first hour, what the permanent fix was, what monitoring exists now that did not before. The grading row is ownership: 'I missed the alert gap and here is the check I added' passes; 'the upstream team sent bad data' fails, even when true.

Behavioral

Walk me through a technical decision you got wrong.

Senior loops ask this to test calibration, not humility theater. Name the decision, the information you had, why the call was reasonable then, when the evidence turned, and how fast you reversed. The failing patterns are picking a fake weakness or a decision that was someone else's. Reversal speed is the metric they write down.

Common questions about the loop

How long should I prep before a data engineering loop?

Four to eight weeks is the range that fits most working engineers. If you've shipped pipelines in the last year and your SQL is fluent, four weeks gets you back to interview pace. If you've been in one tech stack for three years, plan eight. The unavoidable time sink is system design, which doesn't compress: you need fifteen or twenty design problems out loud before you stop sounding rehearsed.

Do I need Spark if I'm not interviewing at FAANG?

Less than the job descriptions suggest. Most mid-market loops touch Spark at the conceptual level (partitioning, broadcast joins, skew) but rarely require you to write PySpark on a whiteboard. The exceptions are companies that genuinely run on Spark at scale: Databricks, Netflix, Airbnb, and most large adtech and ride-share shops. For everyone else, read the JD literally; if Spark is one bullet among many, conceptual is enough.

Which round actually decides most loops?

Data modeling and system design. SQL and Python rounds have unambiguous outcomes; you either wrote the query or you didn't. The design rounds are where senior candidates separate from mid-level ones, and the rejections at L5 and above almost always cite 'did not lead the design conversation' or 'missed a tradeoff.' That's pattern recognition, not knowledge, and it only comes from practicing fifteen to twenty designs out loud.

Should I do the take-home if it's optional?

Yes, if the company is one you want. Most take-homes are assessed as much on the README as on the code: assumptions stated, tradeoffs named, edge cases you chose to skip explicitly called out. A clean take-home moves you from 'maybe' to 'yes' in calibration meetings. The exception is if the prompt is poorly scoped and a senior recruiter can't tell you the expected time investment; that signals an unserious process.

What changes between L4 and L5 expectations?

L4 is assessed on whether you can do the work. L5 is assessed on whether you can decide what work is worth doing. The same SQL question at L4 expects a correct query; at L5 it expects a correct query plus three reasons the requirement might be wrong. The same pipeline prompt at L4 expects a working pipeline; at L5 it expects a working pipeline plus the migration story and the failure mode you'd alert on.

How is this different from a data science interview?

Data science loops lean on statistics, A/B testing, and modeling. Data engineering loops lean on production systems. Both share SQL and Python, but the data engineering SQL bar is higher (window functions, query optimization, the kind of joins that come up because someone partitioned wrong), and the system design round replaces the modeling case study. If a job description mentions both, ask the recruiter which loop you'll get; the difference is real.

How many rounds are in a data engineering interview?

A recruiter screen, one or two technical screens, then a three-to-five round onsite: SQL, Python or coding, data modeling, system design at L4 and above, and behavioral. Five to seven total touches over two to five weeks is the norm at mid-size and large companies. Startups compress the same domains into two or three longer sessions, often with a take-home replacing one screen.

Are data engineering interviews harder than software engineering interviews?

Different, not harder. The algorithm bar is far lower: almost no dynamic programming or graph puzzles. The systems bar is different in kind: dimensional modeling and pipeline design have no LeetCode equivalent, so software engineers switching over routinely fail loops they expected to cruise, not on code but on grain statements and idempotency reasoning. If you are coming from SWE, the modeling round is the one to respect.

Is SQL enough to pass a data engineering interview? Do I need LeetCode?

SQL alone passes nothing beyond an analyst-titled screen; every real DE loop also tests Python, modeling, and at L4+ system design. But you do not need LeetCode-style algorithm prep either: the Python rounds are pipeline-shaped (parsing, dedup, sessionization, retries), and about 4 percent of reported rounds resembled algorithm puzzles. Drill SQL to reflex, Python to pipeline fluency, and put the saved LeetCode hours into modeling.

Is this platform actually free, and what's the catch?

Free, no card, no trial. The problem bank, the grader, the mock interview, the schema canvas, all of it. The platform is funded as part of a broader data product so the practice catalog has never been the revenue source. The 'catch' is that we use anonymized solve patterns to improve the adaptive engine and to weight which problems we add next.

02 / Why practice

Open a problem and start

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
System design is graded on the calls you defend out loud
Ingestion, batch vs streaming, the bronze/silver/gold layers, idempotency, backfill and replay. Sketching the pipeline and naming the failure modes is the signal, not the boxes

Open a problem

Where to go next

SQL round walkthrough→

Window functions, anti-joins, the patterns interviewers test.

Data modeling round→

Grain, SCDs, and the tradeoffs that separate senior from mid.

System design round→

Pipelines not services. Batch vs streaming, late data, failure modes.