Meta data engineer interview questions tagged based on reported interview shape. Window-function-heavy SQL on Presto and Trino. Gap-and-island patterns for engagement streak detection. Ads attribution data modeling. Feed-ranking signals pipeline design. Communication and trade-off articulation weighted explicitly in the Meta rubric.

Meta's data engineer interview loop in 2026 is 5 rounds: SQL (heavy on window functions, gap-and-island for engagement streaks, and time-series aggregation across the ads and feed-ranking warehouses), Python (pipeline-shaped: parsing, dedup, sessionization, with vanilla Python preferred because Meta's internal stack pushes most transformation into Presto, Hive, and Spark), data modeling (frequently the ads attribution model with impressions, clicks, conversions, last-touch versus multi-touch attribution, and the SCD2 advertiser dimension question), system design (typically the feed-ranking signals pipeline or the ads delivery analytics pipeline at 10B+ events per day with multi-region replay and 28-day late-arriving conversion windows), and behavioral (Meta weights communication on every round; the rubric explicitly scores "thinks out loud" and "asks clarifying questions" as separate dimensions from technical correctness).

The Meta data engineer SQL bar specifically. Window functions appear in 80 percent of reported Meta SQL questions, frequently composed (a window over a window via two CTEs). Gap-and-island for engagement streaks appears in roughly 30 percent of reported rounds: consecutive-days-active, longest-watch-session, streak-length distribution. The Presto and Trino dialect is what you write against at Meta. UNNEST for array column expansion. MAP_AGG for key-value rollups. APPROX_DISTINCT for cardinality at scale. Practice in Postgres here is portable for the patterns; the Meta-specific syntactic differences are tagged on each problem.

The Meta data engineer design round is calibrated to senior signal. A working high-level architecture is not enough. The L5 rubric weights: defend the choice against two alternatives, name 3 failure modes per component, address the late-arriving data and replay story, articulate cost reasoning (slot consumption, Spark workers, S3 storage). Conversion windows for ads attribution extend 28 days post-click, which means the design must handle reprocessing yesterday's totals when today brings new attributed conversions. The MERGE-ADD-not-REPLACE pattern is the standard answer. Multi-region replication for ads accounting is the L6 follow-up.

The Meta data engineer Python bar is pipeline-shaped and more vanilla than at most companies. Meta's internal stack favors Presto and Hive for batch SQL, Spark for ML feature pipelines and some heavy ETL. Python is for orchestration, validation, and custom transforms. Common Meta Python interview prompts: implement an SCD Type 2 merge in pandas, sessionize events with itertools.groupby, dedup with composite tiebreaker. The bar is correctness plus error handling. Silent failures in pipeline code are the failure mode the Meta rubric calls out.

Meta calls its levels E4 (mid), E5 (senior), E6 (staff), E7 (senior staff), E8 (principal). E5 is the typical senior data engineer floor. E6+ rubrics emphasize trade-off articulation, failure-mode naming, and the ability to adapt cleanly when the interviewer changes a requirement mid-round. The behavioral round at Meta probes "thinks out loud" and "asks clarifying questions" as separate dimensions from technical correctness; the data engineer who narrates through every step scores above the one who solves silently.

Meta Data Engineer Interview Questions

Meta-tagged data engineer interview questions with live grading.

Common questions

What SQL dialect does Meta use in data engineer interviews?
Presto and Trino predominantly, with some Hive on the data-platform team. The Postgres catalog here is portable for the patterns (window functions, CTEs, aggregation, JOIN). The Presto-specific syntax (UNNEST, MAP_AGG, APPROX_DISTINCT, LATERAL VIEW EXPLODE in Hive) is tagged on the problems where it diverges. Practice in Postgres, mention Presto syntax during the interview.
How does Meta weight communication versus technical correctness in interviews?
Higher than most companies. Meta's data engineer rubric explicitly scores 'thinks out loud', 'asks clarifying questions', and 'articulates trade-offs' as separate dimensions from technical correctness. A data engineer candidate who lands the right answer silently scores lower than one who narrates the wrong approach, corrects mid-round when probed, and articulates why. Practice narrating through every problem.
What is the typical system design scenario at Meta?
Most often the feed-ranking signals pipeline (collect user interaction events at 10B+ per day, compute signals like dwell time and engagement velocity, serve to the ML ranking model with single-digit-millisecond latency) or the ads delivery analytics pipeline (impression-click-conversion attribution with 28-day windows, multi-region replay, idempotent reconciliation). Both 45-60 minutes; both expect 3+ failure modes per component.
How important is gap-and-island at Meta data engineer interviews?
Roughly 30 percent of reported Meta SQL questions involve gap-and-island for engagement streak detection (consecutive days active, longest watch session, days since last login). Master the date minus ROW_NUMBER times INTERVAL 1 day pattern. Multiple variants: HAVING COUNT >= N for minimum streak length, MAX(MAX_island_length) per user for longest streak, COUNT of distinct streaks per user.
Does Meta test PySpark in data engineer interviews?
Less than the Spark-first companies. Meta's internal stack favors Presto and Hive for batch SQL, Spark for ML feature pipelines and some heavy ETL. PySpark questions appear in data engineer loops for data-infra-adjacent teams (the team that builds the Spark platform) and ML data engineering teams. For general data engineer loops at Meta, SQL fluency matters more than Spark.
What is the bar for the Python round at Meta?
Pipeline-shaped Python, more vanilla than at most companies because Meta's stack pushes most transformation into Presto, Hive, and Spark. Python is for orchestration, validation, custom transforms. Common prompts: implement an SCD Type 2 merge in pandas, sessionize events with itertools.groupby, dedup with composite tiebreaker. The bar is correctness plus error handling. Silent failures are the failure mode the Meta rubric calls out.
How long is a Meta data engineer onsite?
4-5 rounds over 4-5 hours: SQL, Python, data modeling, system design, behavioral. Senior+ loops add a second design round (frequently a 'design the data platform' meta-question). Each round is 45-60 minutes. Lunch interview is usually informal but observed by the recruiter.
What level is the equivalent of L4, L5, L6 at Meta?
Meta calls them E4 (mid), E5 (senior), E6 (staff), E7 (senior staff), E8 (principal). E5 is the typical senior data engineer floor. E6+ rubrics emphasize trade-off articulation, failure-mode naming, and the ability to adapt cleanly when the interviewer changes a requirement mid-round.