50 Data Engineer Interview Questions
20 SQL Questions
SQL is 95% of Data Engineer loops. Drill these until medium problems take 12 minutes and hard problems take 20.
Find duplicate rows in a table
Find the second highest salary
Deduplicate keeping most recent per user
Calculate month-over-month revenue growth %
Users active for 3+ consecutive days
Top N per group with ties
7-day rolling average of daily revenue
Self-join: pairs of employees same manager
Pivot rows to columns with conditional aggregation
EXISTS vs IN performance
Recursive CTE for org chart
Sessionization with 30-min gap
Median with PERCENTILE_CONT
Find users who did A then B within 7 days
Fill forward NULL values per user
Detect change-points in a time series
EXPLAIN plan reading and predicate pushdown
Skew handling in JOINs
ROWS vs RANGE in window frames
Materialized view vs result cache vs incremental table
12 Python Questions
Vanilla Python preferred. Pandas only when allowed. Drill these without autocomplete to build muscle memory.
Group records by a key
Read CSV with csv.DictReader
Flatten nested JSON
Dedup by composite key, keep latest
Generator for chunked CSV reading
Inner join two lists of dicts on key
Sessionize events with 30-min gap
LRU cache from scratch
Parse log line with regex, handle malformed
Stream-merge sorted iterators
Concurrent fetch with rate limit
Pandas: SCD Type 2 merge logic
10 Data Modeling Questions
Schema design and trade-off defense. Practice drawing on a whiteboard, narrating the grain first.
Star schema for e-commerce
Define grain of a fact table
Surrogate vs natural keys
SCD Type 2 implementation
Conformed dimensions across data marts
Slowly changing fact (corrections)
Bridge table for many-to-many
Late-arriving dimensions
Medallion architecture trade-offs
Data Vault 2.0 vs Kimball
5 System Design Questions
60-minute design rounds. Use the 4-step framework: clarify, draw, narrate, fail.
Daily ETL from Postgres to Snowflake
Real-time clickstream pipeline at 200K events/sec
Online + offline ML feature store
Daily reconciliation pipeline for payments
Multi-region active-active data warehouse
3 Behavioral Questions
STAR-D format. Specific numbers required. End with a decision postmortem.
Tell me about a project with measurable impact
Tell me about a disagreement with a stakeholder
Tell me about a real failure
How to Use the 50 Questions
Drill all 50 over 4 weeks. Speak the answers out loud. Time yourself: SQL medium under 12 min, hard under 20. Python medium under 15, hard under 25. Modeling under 10 minutes per schema. Design under 60 min per architecture.
Pair the questions with the round-specific deep guides: window functions and SQL patterns interviewers test, vanilla Python patterns interviewers test, star schema and SCD round prep, system design framework for data engineers, behavioral interview prep for Data Engineer. The deep guides explain the framework; this list gives you the practice volume.
Targeting a specific company? After drilling these 50, open the matching company guide: Stripe data engineering interview prep, Airbnb data engineering interview prep, Netflix data engineering interview prep, etc.
Data engineer interview prep FAQ
Are these the only 50 questions I should prep?+
Why only 5 system design questions?+
How do I know if my answer to a question is good enough?+
Should I memorize the SQL syntax or write from scratch each time?+
What if I see a question on this list in my interview?+
How does this list compare to LeetCode or DataLemur?+
Run the 50 Questions in the Browser
Reading the answers is step one. Run SQL and Python against real schemas in our sandbox to build the muscle memory that gets you the offer.
Adjacent Data Engineer Interview Prep Reading
More data engineer interview prep guides
Free downloadable PDF of 100+ data engineer interview questions and answers, updated 2026.
100 of the most asked data engineer interview questions across all four domains.
Real questions from Meta, Amazon, Apple, Netflix, and Google Data Engineer loops, with answers.
Real take-home prompts from Stripe, Airbnb, Databricks, with annotated example solutions.
Window functions, gap-and-island, and the patterns interviewers test in 95% of Data Engineer loops.
JSON flattening, sessionization, and vanilla-Python data wrangling in the Data Engineer coding round.