Netflix data engineer interview questions tagged based on reported interview shape. Spark-heavy across SQL and PySpark rounds. Iceberg as the table format (Netflix open-sourced it). Mantis for low-latency streaming. Late-arriving data and idempotent reconciliation as recurring design themes. The Netflix Culture document's high bar comes through on every round.

Netflix's data engineer interview loop is 5-6 rounds with one of the heaviest Spark presences in the industry. Netflix runs Spark at extreme scale: hundreds of thousands of jobs per day across thousands of nodes. The interview reflects that. PySpark or Scala-Spark coding round (45-60 minutes dedicated). Spark SQL questions in the SQL round. Structured Streaming in the design round. Spark UI screenshot reading as a senior-signal question.

The Netflix data platform team built Atlas (the metrics platform), Mantis (the low-latency streaming platform), Genie (the job scheduler), Iceberg (the table format Netflix open-sourced), Metaflow (the ML workflow framework), and contributed heavily to Spark itself. Candidates for senior+ data engineer roles are expected to know about these open-source contributions and have opinions on them.

The Netflix data engineer SQL round tilts toward streaming and late-arriving patterns. A common prompt: design a query that computes daily view counts but accommodates events that arrive up to 7 days late. The expected answer involves processing windows separated from event-time windows, MERGE-INTO patterns on Iceberg tables, and the idempotent reconciliation story. Window functions are tested but more often as components of larger queries than as the focus of the question itself.

The Netflix data engineer design round centers on streaming or large-scale batch with the Netflix-specific twist of late-arriving data. Real Netflix examples: viewership analytics (events from clients can arrive hours or days late if devices are offline; the design must reprocess yesterday's aggregates), content recommendation feature pipelines (Mantis streaming jobs feeding ML feature stores), and the experimentation platform (A/B test aggregations with watermark handling). The rubric weights Iceberg table format choice and merge strategy, streaming-versus-micro-batch decision, exactly-once semantics with at-least-once plus dedup, watermark management for late events, and multi-region replication for the global content delivery story.

Netflix has a flat structure without traditional engineering levels. Roles are described as 'Senior Software Engineer' or 'Senior Data Engineer' or 'Staff Data Engineer' without numerical levels. The 'senior' bar is roughly equivalent to L5 at FAANG; 'staff' is roughly L6-L7. The 'keeper test' (would the manager fight to keep this person if they tried to leave) applies at every level. The Netflix Culture document explicitly states 'we want stunning colleagues' and the interview rubric reflects that: high bar on every dimension, no consolation for being good-but-not-great in one area. Behavioral rounds probe for ownership, judgment, and the ability to disagree with senior people and push back with data. Less STAR formula than Amazon, more conversational.

The typical Netflix data engineer PySpark question. Join an 800M-row events table with a 2M-row users table, broadcast users, defend the threshold choice. Then same problem with 800M-row by 800M-row, sort-merge, partition strategy. Then aggregate by user_id where 5 percent of users have 95 percent of events, identify skew, salt and rebalance. Often paired with a Spark UI screenshot showing one task at 8x median time. Solve the code, explain what the UI shows, propose the fix.

Netflix Data Engineer Interview Questions

Netflix-tagged data engineer interview questions with live grading.

Common questions

How heavily does Netflix test Spark in data engineer interviews?
Heavily. Netflix runs Spark at extreme scale (hundreds of thousands of jobs per day) and the interview reflects that: a dedicated 45-60 minute PySpark or Scala-Spark coding round, Spark SQL questions in the SQL round, structured streaming in the design round. Spark UI screenshot reading as a senior-signal question. Candidates without Spark fluency rarely pass Netflix data engineer loops.
What is Iceberg and why does Netflix care?
Apache Iceberg is the open-source table format Netflix created (now widely adopted). Provides ACID transactions, schema evolution, time travel, and hidden partitioning for tables stored in object storage. Netflix data engineer interviews frequently include Iceberg-specific questions: MERGE INTO semantics, partition evolution (changing partition scheme without rewriting data), snapshot isolation for concurrent writers. Mention Iceberg in design rounds where you would otherwise mention Delta Lake; at Netflix, Iceberg is the default.
What is the late-arriving-data story in Netflix data engineer interviews?
Recurring theme. Netflix clients (phones, TVs, browsers) can be offline for hours or days; events arrive late and need to update yesterday's or last-week's aggregates without overwriting. The expected design pattern is MERGE INTO with ADD semantics (not REPLACE), processing-time windows separated from event-time windows, watermark configured to allow N days of lateness in structured streaming, and idempotent reprocessing keyed on (event_id, source) so retries do not double-count.
What is Mantis?
Netflix's open-source low-latency stream processing platform, used internally for real-time alerting, operational monitoring, and some feature pipelines. Mentioned in design rounds where a sub-second latency requirement comes up. Most data engineer candidates will not be tested on Mantis internals; the bar is knowing it exists and that it is the answer for sub-second streaming at Netflix scale.
How does the Netflix Culture document affect interviews?
Netflix's culture document explicitly states 'we want stunning colleagues' and the interview rubric reflects that: high bar on every dimension, no consolation for being good-but-not-great in one area. The 'keeper test' (would the manager fight to keep this person if they tried to leave) shows up implicitly. Behavioral rounds probe for ownership, judgment, and the ability to disagree with senior people and push back with data.
What is the typical PySpark question at Netflix?
Join an 800M-row events table with a 2M-row users table, broadcast users, defend the threshold choice. Then same problem with 800M-row by 800M-row, sort-merge, partition strategy. Then aggregate by user_id where 5 percent of users have 95 percent of events, identify skew, salt and rebalance. Often paired with a Spark UI screenshot showing one task at 8x median time.
Does Netflix do live coding or take-home for data engineer interviews?
Both, depending on team. Live coding for SQL, Python, and PySpark rounds (typically in CoderPad or similar). Take-home occasionally for senior+ data infrastructure roles where the project shape requires more depth. Take-home format: 4-8 hour project building a working pipeline on a provided dataset, with a follow-up discussion of trade-offs.
What levels does Netflix hire data engineers at?
Netflix has a flat structure without traditional engineering levels; roles are described as 'Senior Software Engineer' or 'Senior Data Engineer' or 'Staff Data Engineer' without numerical levels. The 'senior' bar is roughly equivalent to L5 at FAANG; 'staff' is roughly L6-L7. The 'keeper test' applies at every level.