Meta Data Engineer Interview Guide

Meta DE interview process: round by round

Six stages from first contact to offer. The dedicated data modeling round is Meta-specific; most other FAANG companies embed it in design or SQL.

01
Recruiter screen
Non-technical call covering background, motivation for joining Meta, and role fit. The recruiter checks whether experience aligns with team and level. They will ask about scale: how much data you have worked with, what tools you used, why Meta specifically.
- ▸Quantify data scale: row counts, daily volumes, GB/TB processed
- ▸Know Meta built Presto (now Trino), uses Spark heavily, processes exabytes daily
- ▸Ask which team the role is for; Meta DE roles vary across Ads, Integrity, Instagram, and Reality Labs
02
Technical phone screen
Live SQL coding, usually 1 to 2 problems. Meta phone screens lean on aggregation, window functions, and multi-step queries set in Meta-like contexts: user engagement, ad impressions, content moderation. The interviewer watches your problem-solving process as much as your final answer.
- ▸Think out loud — Meta evaluates approach, not just result
- ▸Expect window functions (ROW_NUMBER, LAG) combined with CTEs
- ▸Ask clarifying questions: NULL handling, duplicates, timestamp granularity
03
Onsite: SQL deep dive
Harder than the phone screen. Two to three SQL problems with increasing complexity. First is a warm-up (basic aggregation). Second involves window functions or multi-step logic. Third may involve optimization: query works, now discuss how to make it efficient at scale.
- ▸Practice writing SQL without autocomplete; Meta uses a shared document
- ▸If you finish early, the interviewer adds constraints (this is a good sign)
- ▸Optimization discussion tests awareness: indexing, partition pruning, avoiding unnecessary sorts
04
Onsite: data modeling
Design a data model for a Meta product: Facebook Events, Instagram Stories, Marketplace, or Messenger. Define fact and dimension tables, grain, slowly changing dimensions, and how the model supports specific analytical queries. This round tests whether you think about data as a system.
- ▸Start with the business question the model answers, then work backward to the schema
- ▸Define the grain explicitly: one row per user per day, one row per event, one row per impression
- ▸Discuss SCD Type 2 for dimensions that change over time
05
Onsite: system design
Design a data pipeline at Meta scale. Examples: real-time ad metrics, content moderation event processing, cross-platform activity aggregation. The interviewer cares about reasoning at scale (billions of events per day), fault tolerance, data quality, and batch vs streaming tradeoffs.
- ▸Start with requirements: latency SLA, data volume, consumers
- ▸Mention partitioning, horizontal scaling, backpressure handling
- ▸Draw the architecture, even in a shared doc. Visual communication matters.
06
Onsite: behavioral
Meta calls this the 'values' round. Questions focus on collaboration, conflict resolution, and impact. They want specific STAR-format examples. Meta values 'Move Fast' and 'Build Social Value,' so frame stories around speed of delivery and user impact.
- ▸Prepare 4–5 stories that each demonstrate multiple values
- ▸Avoid generic answers; 'I communicated with the team' is not specific
- ▸Quantify impact: runtime reduction, cost savings, stakeholder satisfaction

Meta DE loop vs other FAANG

How Meta's interview structure differs from peer companies. The dedicated data modeling round is the biggest differentiator.

Dimension	Meta	Google	Amazon	Microsoft
SQL rounds	2 (phone + onsite deep dive)	1	1	1
Data modeling round	Dedicated 45-min round	Embedded	Embedded	Embedded
Algorithm depth	Light (rarely LC-style)	Medium (medium-hard LC)	Light	Medium
System design	1 dedicated round	1 dedicated round	1 dedicated round	1 dedicated round
Behavioral framework	Values (Move Fast, Build Social Value)	Googleyness	Leadership Principles	Behavioral
External hire IC level	IC4–IC5 typical	L4–L5 typical	L5–L6 typical	L62–L64 typical

Real Meta interview questions

Reported questions from this company's loops, tagged by domain, round, and level.

Pythonphone screen python· L42025

Write a function that inverts a dictionary, mapping each value to the list of keys that had that value

Write a function invert_dict(d) that takes a dictionary where all values are hashable (strings, ints, tuples, etc.) and returns a new dictionary mapping each original value to a sorted list of all keys that mapped to it. Example: d = {'a': 1, 'b': 2, 'c': 1, 'd': 3, 'e': 2} Output: {1: ['a', 'c'], 2: ['b', 'e'], 3: ['d']} d = {'x': 'hello', 'y': 'hello', 'z': 'world'} Output: {'hello': ['x', 'y'], 'world': ['z']} d = {} Output: {} Edge cases: empty dict, all keys mapping to the same value, values that are already unique (each list has one element).

SQLphone screen sql· L52025

Find authors who have published at least 5 books

Given a star schema with sales transactions and a book dimensions table, identify all authors who have published at least 5 books.

mixedphone screen sql· unknown2022

Meta | Data Engineer | Phone Interview

Here is my Meta Data Engineer Phone Screen Interview experience (1 hour total- 30 mnts for SQL and 30 Mnts for Python)--Failed\n\n1) I have been asked what section you want to code and i requested for SQL\n Tables names are like. Books , Author , Sales , Cutomer\n\t Book table contains Book name , Author ID , Genre\n\t\t\t\t a) Write a query to print author ID who wrote 5 or more Genres ---Test cases passed\n\t\t\t\t\t \'\'\' select author_id from book\n\t\t\t\t\t\t group by author_id\n\t\t\t\t\t\t having count(author_id) >=5\'\'\n\t\t\t \n\t\tb) Sale table contains ID , tranaction_date ,…

Data Modelingonsite data modeling· L62025

Design a database schema for a ride-sharing service, including tables, field types, and keys

Design a relational database schema for a ride-sharing app. Must specify tables (users, drivers, rides, payments, etc.), field types for each column, primary keys, and foreign key relationships. Discuss one-to-many (driver to rides) and many-to-many (riders to ride requests) relationships, normalization choices, and indexing strategy.

Pythononsite python· L52024

Given a list of log entries with timestamps and event types, compute the count of each event type within each hour

Write a function hourly_event_counts(logs) where each log is a tuple of (timestamp_str, event_type). Timestamps are in 'YYYY-MM-DD HH:MM:SS' format. Return a dictionary where keys are hour strings ('YYYY-MM-DD HH') and values are dictionaries of {event_type: count}. Example: logs = [ ('2024-03-15 09:12:00', 'click'), ('2024-03-15 09:45:00', 'click'), ('2024-03-15 09:30:00', 'view'), ('2024-03-15 10:05:00', 'click'), ] Output: { '2024-03-15 09': {'click': 2, 'view': 1}, '2024-03-15 10': {'click': 1} } logs = [] Output: {} Edge cases: empty log list, logs…

SQLphone screen sql· L52025

Calculate percentage of total sales completed on the same day the customer registered

Given a star schema with sales transactions and customer registration data, calculate the percentage of total sales where the sale was completed on the same day the customer registered.

mixedphone screen sql· unknown2022

Company : Facebook | Meta\nPosition : Data Engineer\nLocation : Virginia, USA\nInterview: Virtual Onsite \nYoE: 9+ yrs\n\n**Phone Interview** : \n\nThere was one 1 hr phone interview, there was two section : Algo and SQL \n\nSQL: First I choose to go for SQL , So, my suggestion here: go for the section which you feel more confident, as it will help to boost your confidence and also if you can solve first section little earlier then for second section you can buy some time. \n\nit was having 3 sql question from a given dataset, start with easier to little advance SQL. I nailed it here as I…

Data Modelingonsite data modeling· L62022

Design a data warehouse to combine cellular tower connectivity data with Facebook app logs, including pipeline architecture and dimension/fact schema

Problem Solving round: interviewer sent problem statement 24 hours in advance. FB Connectivity product collects data from cellular towers and marries them to FB app logs to create a data product. Candidate must design: data access strategies, big data processing system components, and the Data Warehouse model. Candidate used Excalidraw to draw architecture. Evaluated on thoroughness of schema design, storage strategy, and system component choices. Virginia USA, 9+ YOE.

Pythononsite python· L52024

Given a list of records with a category and a value, return the top N records per category sorted by value descending

Write a function top_n_per_category(records, n) where each record is a dictionary with 'category' and 'value' keys. Return a dictionary mapping each category to a list of its top n records sorted by value descending. If two records in the same category have equal values, preserve their original relative order. Example: records = [ {'category': 'A', 'value': 10}, {'category': 'A', 'value': 30}, {'category': 'B', 'value': 20}, {'category': 'A', 'value': 20}, {'category': 'B', 'value': 50}, {'category': 'B', 'value': 40}, ] top_n_per_category(records, 2) Output:…

SQLphone screen sql· L52025

Find customers who purchased 3 or more books on both the first and last day of sales

Given sales transactions data, find customers who purchased 3 or more books on both the first day of sales recorded in the dataset AND the last day of sales recorded.

Real Meta interview questions

Reported questions from this company's loops, tagged by domain, round, and level.

Pythonphone screen python· L42025

Write a function that inverts a dictionary, mapping each value to the list of keys that had that value

Write a function invert_dict(d) that takes a dictionary where all values are hashable (strings, ints, tuples, etc.) and returns a new dictionary mapping each original value to a sorted list of all keys that mapped to it. Example: d = {'a': 1, 'b': 2, 'c': 1, 'd': 3, 'e': 2} Output: {1: ['a', 'c'], 2: ['b', 'e'], 3: ['d']} d = {'x': 'hello', 'y': 'hello', 'z': 'world'} Output: {'hello': ['x', 'y'], 'world': ['z']} d = {} Output: {} Edge cases: empty dict, all keys mapping to the same value, values that are already unique (each list has one element).

SQLphone screen sql· L52025

Find authors who have published at least 5 books

Given a star schema with sales transactions and a book dimensions table, identify all authors who have published at least 5 books.

mixedphone screen sql· unknown2022

Meta | Data Engineer | Phone Interview

Here is my Meta Data Engineer Phone Screen Interview experience (1 hour total- 30 mnts for SQL and 30 Mnts for Python)--Failed\n\n1) I have been asked what section you want to code and i requested for SQL\n Tables names are like. Books , Author , Sales , Cutomer\n\t Book table contains Book name , Author ID , Genre\n\t\t\t\t a) Write a query to print author ID who wrote 5 or more Genres ---Test cases passed\n\t\t\t\t\t \'\'\' select author_id from book\n\t\t\t\t\t\t group by author_id\n\t\t\t\t\t\t having count(author_id) >=5\'\'\n\t\t\t \n\t\tb) Sale table contains ID , tranaction_date ,…

Data Modelingonsite data modeling· L62025

Design a database schema for a ride-sharing service, including tables, field types, and keys

Design a relational database schema for a ride-sharing app. Must specify tables (users, drivers, rides, payments, etc.), field types for each column, primary keys, and foreign key relationships. Discuss one-to-many (driver to rides) and many-to-many (riders to ride requests) relationships, normalization choices, and indexing strategy.

Pythononsite python· L52024

Given a list of log entries with timestamps and event types, compute the count of each event type within each hour

Write a function hourly_event_counts(logs) where each log is a tuple of (timestamp_str, event_type). Timestamps are in 'YYYY-MM-DD HH:MM:SS' format. Return a dictionary where keys are hour strings ('YYYY-MM-DD HH') and values are dictionaries of {event_type: count}. Example: logs = [ ('2024-03-15 09:12:00', 'click'), ('2024-03-15 09:45:00', 'click'), ('2024-03-15 09:30:00', 'view'), ('2024-03-15 10:05:00', 'click'), ] Output: { '2024-03-15 09': {'click': 2, 'view': 1}, '2024-03-15 10': {'click': 1} } logs = [] Output: {} Edge cases: empty log list, logs…

SQLphone screen sql· L52025

Calculate percentage of total sales completed on the same day the customer registered

Given a star schema with sales transactions and customer registration data, calculate the percentage of total sales where the sale was completed on the same day the customer registered.

mixedphone screen sql· unknown2022

Company : Facebook | Meta\nPosition : Data Engineer\nLocation : Virginia, USA\nInterview: Virtual Onsite \nYoE: 9+ yrs\n\n**Phone Interview** : \n\nThere was one 1 hr phone interview, there was two section : Algo and SQL \n\nSQL: First I choose to go for SQL , So, my suggestion here: go for the section which you feel more confident, as it will help to boost your confidence and also if you can solve first section little earlier then for second section you can buy some time. \n\nit was having 3 sql question from a given dataset, start with easier to little advance SQL. I nailed it here as I…

Data Modelingonsite data modeling· L62022

Design a data warehouse to combine cellular tower connectivity data with Facebook app logs, including pipeline architecture and dimension/fact schema

Problem Solving round: interviewer sent problem statement 24 hours in advance. FB Connectivity product collects data from cellular towers and marries them to FB app logs to create a data product. Candidate must design: data access strategies, big data processing system components, and the Data Warehouse model. Candidate used Excalidraw to draw architecture. Evaluated on thoroughness of schema design, storage strategy, and system component choices. Virginia USA, 9+ YOE.

Pythononsite python· L52024

Given a list of records with a category and a value, return the top N records per category sorted by value descending

Write a function top_n_per_category(records, n) where each record is a dictionary with 'category' and 'value' keys. Return a dictionary mapping each category to a list of its top n records sorted by value descending. If two records in the same category have equal values, preserve their original relative order. Example: records = [ {'category': 'A', 'value': 10}, {'category': 'A', 'value': 30}, {'category': 'B', 'value': 20}, {'category': 'A', 'value': 20}, {'category': 'B', 'value': 50}, {'category': 'B', 'value': 40}, ] top_n_per_category(records, 2) Output:…

SQLphone screen sql· L52025

Find customers who purchased 3 or more books on both the first and last day of sales

Given sales transactions data, find customers who purchased 3 or more books on both the first day of sales recorded in the dataset AND the last day of sales recorded.

Real Meta interview questions

Reported questions from this company's loops, tagged by domain, round, and level.

Pythonphone screen python· L42025

Write a function that inverts a dictionary, mapping each value to the list of keys that had that value

Write a function invert_dict(d) that takes a dictionary where all values are hashable (strings, ints, tuples, etc.) and returns a new dictionary mapping each original value to a sorted list of all keys that mapped to it. Example: d = {'a': 1, 'b': 2, 'c': 1, 'd': 3, 'e': 2} Output: {1: ['a', 'c'], 2: ['b', 'e'], 3: ['d']} d = {'x': 'hello', 'y': 'hello', 'z': 'world'} Output: {'hello': ['x', 'y'], 'world': ['z']} d = {} Output: {} Edge cases: empty dict, all keys mapping to the same value, values that are already unique (each list has one element).

SQLphone screen sql· L52025

Find authors who have published at least 5 books

Given a star schema with sales transactions and a book dimensions table, identify all authors who have published at least 5 books.

mixedphone screen sql· unknown2022

Meta | Data Engineer | Phone Interview

Here is my Meta Data Engineer Phone Screen Interview experience (1 hour total- 30 mnts for SQL and 30 Mnts for Python)--Failed\n\n1) I have been asked what section you want to code and i requested for SQL\n Tables names are like. Books , Author , Sales , Cutomer\n\t Book table contains Book name , Author ID , Genre\n\t\t\t\t a) Write a query to print author ID who wrote 5 or more Genres ---Test cases passed\n\t\t\t\t\t \'\'\' select author_id from book\n\t\t\t\t\t\t group by author_id\n\t\t\t\t\t\t having count(author_id) >=5\'\'\n\t\t\t \n\t\tb) Sale table contains ID , tranaction_date ,…

Data Modelingonsite data modeling· L62025

Design a database schema for a ride-sharing service, including tables, field types, and keys

Design a relational database schema for a ride-sharing app. Must specify tables (users, drivers, rides, payments, etc.), field types for each column, primary keys, and foreign key relationships. Discuss one-to-many (driver to rides) and many-to-many (riders to ride requests) relationships, normalization choices, and indexing strategy.

Pythononsite python· L52024

Given a list of log entries with timestamps and event types, compute the count of each event type within each hour

Write a function hourly_event_counts(logs) where each log is a tuple of (timestamp_str, event_type). Timestamps are in 'YYYY-MM-DD HH:MM:SS' format. Return a dictionary where keys are hour strings ('YYYY-MM-DD HH') and values are dictionaries of {event_type: count}. Example: logs = [ ('2024-03-15 09:12:00', 'click'), ('2024-03-15 09:45:00', 'click'), ('2024-03-15 09:30:00', 'view'), ('2024-03-15 10:05:00', 'click'), ] Output: { '2024-03-15 09': {'click': 2, 'view': 1}, '2024-03-15 10': {'click': 1} } logs = [] Output: {} Edge cases: empty log list, logs…

SQLphone screen sql· L52025

Calculate percentage of total sales completed on the same day the customer registered

Given a star schema with sales transactions and customer registration data, calculate the percentage of total sales where the sale was completed on the same day the customer registered.

mixedphone screen sql· unknown2022

Company : Facebook | Meta\nPosition : Data Engineer\nLocation : Virginia, USA\nInterview: Virtual Onsite \nYoE: 9+ yrs\n\n**Phone Interview** : \n\nThere was one 1 hr phone interview, there was two section : Algo and SQL \n\nSQL: First I choose to go for SQL , So, my suggestion here: go for the section which you feel more confident, as it will help to boost your confidence and also if you can solve first section little earlier then for second section you can buy some time. \n\nit was having 3 sql question from a given dataset, start with easier to little advance SQL. I nailed it here as I…

Data Modelingonsite data modeling· L62022

Design a data warehouse to combine cellular tower connectivity data with Facebook app logs, including pipeline architecture and dimension/fact schema

Problem Solving round: interviewer sent problem statement 24 hours in advance. FB Connectivity product collects data from cellular towers and marries them to FB app logs to create a data product. Candidate must design: data access strategies, big data processing system components, and the Data Warehouse model. Candidate used Excalidraw to draw architecture. Evaluated on thoroughness of schema design, storage strategy, and system component choices. Virginia USA, 9+ YOE.

Pythononsite python· L52024

Given a list of records with a category and a value, return the top N records per category sorted by value descending

Write a function top_n_per_category(records, n) where each record is a dictionary with 'category' and 'value' keys. Return a dictionary mapping each category to a list of its top n records sorted by value descending. If two records in the same category have equal values, preserve their original relative order. Example: records = [ {'category': 'A', 'value': 10}, {'category': 'A', 'value': 30}, {'category': 'B', 'value': 20}, {'category': 'A', 'value': 20}, {'category': 'B', 'value': 50}, {'category': 'B', 'value': 40}, ] top_n_per_category(records, 2) Output:…

SQLphone screen sql· L52025

Find customers who purchased 3 or more books on both the first and last day of sales

Given sales transactions data, find customers who purchased 3 or more books on both the first day of sales recorded in the dataset AND the last day of sales recorded.

Meta's data stack: tools to reference by name

Naming the right internal tool signals you've done homework on Meta specifically. None of these require deep internal knowledge to mention.

Tool	What it does	Interview signal
Presto / Trino	Interactive SQL at petabyte scale. Meta built and open-sourced it.	Default SQL engine in interviews. Reference by name.
Spark	Batch processing for everything from training data to feature stores.	Reference for ETL and batch system design questions.
Scuba	Low-latency analytics over recent data (~few hours). In-memory columnar.	Reference when discussing operational metrics dashboards.
Dataswarm / Custom Airflow	Internal orchestration. Conceptually similar to Airflow.	Mention if asked about pipeline scheduling.
Hive Metastore	Schema catalog. Most tables in the warehouse are Hive-style external tables on object storage.	Schema design questions assume this metadata layer.
Velox	Vectorized execution engine shared across Presto/Spark.	Mention when discussing query performance — shows depth.

Real Meta interview questions

Reported questions from this company's loops, tagged by domain, round, and level.

Pythonphone screen python· L42025

Write a function that inverts a dictionary, mapping each value to the list of keys that had that value

Write a function invert_dict(d) that takes a dictionary where all values are hashable (strings, ints, tuples, etc.) and returns a new dictionary mapping each original value to a sorted list of all keys that mapped to it. Example: d = {'a': 1, 'b': 2, 'c': 1, 'd': 3, 'e': 2} Output: {1: ['a', 'c'], 2: ['b', 'e'], 3: ['d']} d = {'x': 'hello', 'y': 'hello', 'z': 'world'} Output: {'hello': ['x', 'y'], 'world': ['z']} d = {} Output: {} Edge cases: empty dict, all keys mapping to the same value, values that are already unique (each list has one element).

SQLphone screen sql· L52025

Find authors who have published at least 5 books

Given a star schema with sales transactions and a book dimensions table, identify all authors who have published at least 5 books.

mixedphone screen sql· unknown2022

Meta | Data Engineer | Phone Interview

Here is my Meta Data Engineer Phone Screen Interview experience (1 hour total- 30 mnts for SQL and 30 Mnts for Python)--Failed\n\n1) I have been asked what section you want to code and i requested for SQL\n Tables names are like. Books , Author , Sales , Cutomer\n\t Book table contains Book name , Author ID , Genre\n\t\t\t\t a) Write a query to print author ID who wrote 5 or more Genres ---Test cases passed\n\t\t\t\t\t \'\'\' select author_id from book\n\t\t\t\t\t\t group by author_id\n\t\t\t\t\t\t having count(author_id) >=5\'\'\n\t\t\t \n\t\tb) Sale table contains ID , tranaction_date ,…

Data Modelingonsite data modeling· L62025

Design a database schema for a ride-sharing service, including tables, field types, and keys

Design a relational database schema for a ride-sharing app. Must specify tables (users, drivers, rides, payments, etc.), field types for each column, primary keys, and foreign key relationships. Discuss one-to-many (driver to rides) and many-to-many (riders to ride requests) relationships, normalization choices, and indexing strategy.

Pythononsite python· L52024

Given a list of log entries with timestamps and event types, compute the count of each event type within each hour

Write a function hourly_event_counts(logs) where each log is a tuple of (timestamp_str, event_type). Timestamps are in 'YYYY-MM-DD HH:MM:SS' format. Return a dictionary where keys are hour strings ('YYYY-MM-DD HH') and values are dictionaries of {event_type: count}. Example: logs = [ ('2024-03-15 09:12:00', 'click'), ('2024-03-15 09:45:00', 'click'), ('2024-03-15 09:30:00', 'view'), ('2024-03-15 10:05:00', 'click'), ] Output: { '2024-03-15 09': {'click': 2, 'view': 1}, '2024-03-15 10': {'click': 1} } logs = [] Output: {} Edge cases: empty log list, logs…

SQLphone screen sql· L52025

Calculate percentage of total sales completed on the same day the customer registered

Given a star schema with sales transactions and customer registration data, calculate the percentage of total sales where the sale was completed on the same day the customer registered.

mixedphone screen sql· unknown2022

Company : Facebook | Meta\nPosition : Data Engineer\nLocation : Virginia, USA\nInterview: Virtual Onsite \nYoE: 9+ yrs\n\n**Phone Interview** : \n\nThere was one 1 hr phone interview, there was two section : Algo and SQL \n\nSQL: First I choose to go for SQL , So, my suggestion here: go for the section which you feel more confident, as it will help to boost your confidence and also if you can solve first section little earlier then for second section you can buy some time. \n\nit was having 3 sql question from a given dataset, start with easier to little advance SQL. I nailed it here as I…

Data Modelingonsite data modeling· L62022

Design a data warehouse to combine cellular tower connectivity data with Facebook app logs, including pipeline architecture and dimension/fact schema

Problem Solving round: interviewer sent problem statement 24 hours in advance. FB Connectivity product collects data from cellular towers and marries them to FB app logs to create a data product. Candidate must design: data access strategies, big data processing system components, and the Data Warehouse model. Candidate used Excalidraw to draw architecture. Evaluated on thoroughness of schema design, storage strategy, and system component choices. Virginia USA, 9+ YOE.

Pythononsite python· L52024

Given a list of records with a category and a value, return the top N records per category sorted by value descending

Write a function top_n_per_category(records, n) where each record is a dictionary with 'category' and 'value' keys. Return a dictionary mapping each category to a list of its top n records sorted by value descending. If two records in the same category have equal values, preserve their original relative order. Example: records = [ {'category': 'A', 'value': 10}, {'category': 'A', 'value': 30}, {'category': 'B', 'value': 20}, {'category': 'A', 'value': 20}, {'category': 'B', 'value': 50}, {'category': 'B', 'value': 40}, ] top_n_per_category(records, 2) Output:…

SQLphone screen sql· L52025

Find customers who purchased 3 or more books on both the first and last day of sales

Given sales transactions data, find customers who purchased 3 or more books on both the first day of sales recorded in the dataset AND the last day of sales recorded.

Meta-specific preparation tips

What separates strong Meta candidates from passing ones. Each tip stands alone.

Scale awareness

Acknowledge Meta scale in every answer

When designing a pipeline, mention billions of events. When writing SQL, discuss performance on tables with hundreds of billions of rows. When proposing optimization, name the partition strategy. Scale awareness is the single biggest differentiator.

Homework signal

Know Meta's tech stack

Meta built Presto (now Trino) for interactive SQL. They use Spark for batch, Scuba for real-time analytics, Velox for vectorized execution, and custom orchestration. Referencing these by name shows homework without requiring deep internal knowledge.

Realism

Use Meta-like schemas in SQL practice

Practice with tables named user_sessions, ad_impressions, content_interactions, friend_requests. Think about what data each Meta feature generates — every like, comment, share, impression, and scroll is tracked. Your SQL fluency on Meta-shaped data will be visible.

Product context

Think metrics and experimentation

Meta is metrics-driven. DEs support A/B testing, metric computation, and experiment analysis. Mention how your pipeline supports experimentation: control vs treatment, metric slicing by variant, statistical power for small-effect detection.

Don't underprep

Behavioral round has real weight

Some candidates over-prepare for technical and under-prepare for behavioral. At Meta the behavioral round can be a tiebreaker. Prepare specific stories demonstrating cross-team collaboration and shipping under deadlines. Generic answers cost offers.

Communication

Optimize for clarity in SQL deep-dive

Meta's SQL deep-dive uses a shared document with no autocomplete. Type slowly, name CTEs descriptively (active_users not au), comment any non-obvious logic. The interviewer follows your thinking through your code structure.

Meta data engineer compensation

Median and range from verified salary reports, by level.

Level	Base	Total comp
JuniorL3	$130K median	$166K median · $160K–$171K · 4 reports
Mid-levelL4	$165K median	$242K median · $203K–$272K · 10 reports
SeniorL5	$190K median	$341K median · $246K–$353K · 11 reports
StaffL6	$242K median	$471K median · $415K–$509K · 8 reports
PrincipalL7	$300K–$380K

The Meta data stack

What their data engineers work with day to day. Worth brushing up on the heavy hitters before the loop.

Languages

Python12 SQL12 Scala8

Meta practice set

Problems on the platform tagged and predicted for Meta loops, from live listings and interview reports.

SQLeasy~5 min

Full Customer Order List

Return first_name, last_name, and country for every customer in customers. Sort alphabetically by first_name, then last_name.

Pythonmedium~10 min

Detect Cycle in Sequence

You are given a list of integers where each value at index i is the next index to visit (or -1 to terminate). Starting from index 0, follow the chain and return True if you revisit any index, False otherwise. Out-of-range indices (including -1) count as termination, not a cycle.

Data Modelingeasy~15 min

The Balance Always Reconciles

We're a consumer lending company that offers personal loans, auto loans, and mortgages. Customers make monthly payments, but sometimes they pay early, miss payments, or refinance. The operations team needs outstanding balances and the risk team needs to flag delinquent accounts. Can you design the schema?

SQLeasy~5 min

High Volume Batch Jobs

Surface all batch jobs that processed more than 5000 rows, showing each job's name, priority, and rows processed, ranked from most to fewest.

SQLmedium~5 min

Active Duo

The growth team is building a cross-engagement segment of users who both make purchases and log browsing sessions on the platform. Return a deduplicated list of usernames for users with activity in both areas.

Pythoneasy~10 min

Quantile Calculator

Given a list of numbers and percentile (0-100), return the value at that percentile using linear interpolation. The index is percentile / 100 * (n - 1); if fractional, linearly interpolate between the floor and ceiling indices of the sorted values.

Meta DE interview FAQ

How many rounds are in a Meta DE interview?+

Typically 5 to 6: recruiter screen, technical phone screen (SQL), and 3 to 4 onsite rounds covering SQL deep dive, data modeling, system design, and behavioral. The exact structure depends on team and level.

What SQL topics does Meta test most?+

Window functions, multi-step aggregation, time-series analysis (consecutive days, rolling averages, funnels). CTEs are expected for multi-step queries. The phone screen starts at intermediate difficulty.

Does Meta use LeetCode-style questions for DEs?+

Generally no. Meta DE interviews focus on SQL, data modeling, and system design. Some teams include Python for ETL scripting, but algorithm problems are rare for DE roles. Compare with Google, which does include lighter algorithm problems.

What level are most Meta DE roles?+

Most external hires come in at IC4 (mid) or IC5 (senior). IC3 focuses on SQL and basic modeling. IC5 adds system design and cross-functional impact stories. IC6+ requires org-level influence and is harder to break into externally.

How should I prepare for Meta's data modeling round?+

Design star schemas for 5 Meta products (News Feed, Marketplace, Reels, Events, Groups). For each: identify fact tables, dimension tables, grain, and the top 3 analytical queries the model supports. Practice explaining the choices out loud — the round tests reasoning, not just final diagrams.

Does Meta still ask the question 'Find users with 3 consecutive login days?'+

Variants are still common. The exact phrasing rotates, but the underlying gaps-and-islands pattern (consecutive events, longest streak, daily activity sequences) is core to Meta SQL rounds. Practicing this pattern is high-yield.

02 / Why practice

Prepare at Meta interview difficulty

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Practice Meta-level SQL

Related Meta prep

Meta interview deep dive→

Full round-by-round breakdown with sample problems

Meta DE compensation→

Base, equity, and total comp by level

Meta mock interview→

Simulated Meta loop with AI scoring

Data modeling questions→

Star schema, SCD, grain, normalization

Meta Data Engineer Interview Guide

Meta DE interview process: round by round

Recruiter screen

Technical phone screen

Onsite: SQL deep dive

Onsite: data modeling

Onsite: system design

Onsite: behavioral

Meta DE loop vs other FAANG

Real Meta interview questions

Walk into Meta knowing the SQL pattern they'll test.

Real Meta interview questions

Deploy Velocity

Real Meta interview questions

Meta's data stack: tools to reference by name

Real Meta interview questions

Meta-specific preparation tips

Acknowledge Meta scale in every answer

Know Meta's tech stack

Use Meta-like schemas in SQL practice

Think metrics and experimentation

Behavioral round has real weight

Optimize for clarity in SQL deep-dive

Meta data engineer compensation

The Meta data stack

Meta practice set

Meta DE interview FAQ

Prepare at Meta interview difficulty

Related Meta prep