Pinterest Data Engineer Interview

Pinterest serves a discovery feed to 500M+ monthly active users on top of a graph of pins, boards, users, and topics. The data engineer interview reflects this: heavy emphasis on recommendation feature pipelines, ad attribution at scale, and graph-shaped data modeling. Pinterest moved aggressively to PySpark + Iceberg + Druid in 2023 to 2024, and the interview now tests modern lakehouse patterns alongside the older Hive workload they still operate. Loops run 4 to 5 weeks. Pair with the the full data engineer interview playbook.

Technology · San Francisco, US

live data · June 11, 2026

DE total comp

$350K–$490K

senior level · full ladder below

Hiring now

2 open DE roles

live from career pages

Team happiness

50 / 100 · Neutral

model score from employee signals

Layoff risk (30d)

Moderate

Employee sentiment

3.4 / 5

Employees

51–200

Pinterest Data Engineer Interview Process

5 to 6 rounds, 4 to 5 weeks. Mostly virtual.

01
Recruiter Screen (30 min)
Conversational call. Pinterest hires across Home Feed, Search, Ads, Trust and Safety, Creator Tools, ML Platform, Analytics Engineering. Each team has its own data character: Home Feed leans recommendation features, Ads leans attribution and reporting, Trust and Safety leans graph and behavioral signal pipelines. Mention experience with recommendation systems, ad tech, or graph data if you have it.
02
Technical Phone Screen (60 min)
Live SQL or Python in CoderPad. SQL leans on funnel analytics (impression to click to save to outbound click) and rolling-window aggregations. Python leans on graph traversal (find related pins via shared boards) and feature engineering with PySpark.
03
System Design Round (60 min)
Common: design the home feed recommendation feature pipeline, design the ad attribution pipeline with 28-day click window, design the trust and safety signal aggregation system. Use the 4-step framework. Cover real-time + batch dual-track for features, point-in-time correctness for training data, schema evolution as model features churn weekly.
04
Live Coding Onsite (60 min)
Second live coding round, opposite language from phone screen. Often includes a follow-up that adds a graph traversal or feature engineering component.
05
ML Platform Round (60 min, ML-adjacent teams only)
Feature stores, training data pipelines, online vs offline features, point-in-time correctness, A/B test instrumentation. Asked of candidates targeting Home Feed, Search, Ads, or ML Platform teams. Skipped for Analytics Engineering or Trust and Safety roles.
06
Behavioral Round (60 min)
STAR-D format. Pinterest values pragmatic decisions in product-ambiguous contexts. Stories about cutting scope to ship, choosing the simpler model over the elegant one, and influencing PMs on metric definitions all score well. Decision postmortem heavily weighted.

Pinterest data engineer compensation

Industry ranges by level.

Level	Base	Total comp
JuniorL3	$130K–$160K	$165K–$225K
Mid-levelL4	$160K–$200K	$240K–$340K
SeniorL5	$200K–$250K	$350K–$490K
StaffL6	$240K–$305K	$470K–$660K
PrincipalL7	$285K–$365K	$610K–$860K

The Pinterest data stack

What their data engineers work with day to day. Worth brushing up on the heavy hitters before the loop.

Tools and platforms

Airflow1 Flink1 Presto1 Spark1

15 Real Pinterest Data Engineer Interview Questions With Worked Answers

Questions reported by candidates in 2024-2026 loops, paraphrased and de-identified. Each answer covers the approach, the gotcha, and the typical follow-up.

SQL · L4

Compute pin engagement funnel: impression to click to save to outbound click

Funnel SQL with conditional aggregation per stage. SUM CASE WHEN event_type = 'impression' THEN 1 END AS impressions, similar for click, save, outbound. Group by date or pin_id. Volunteer the de-duplication consideration: a single user impression-then-click should count once, not twice. The follow-up: how do you handle the user who saw the same pin 3 times before clicking? Answer: dedup by user_id and pin_id within the attribution window.

SQL · L4

Find boards with the highest engagement growth this week

Aggregate engagement events per board per week. LAG to previous week. Compute (current - prior) / NULLIF(prior, 0). Order by growth pct desc. Volunteer that new boards have artificially high growth (zero baseline); filter to boards with at least N engagements in trailing window.

SQL · L5

Compute pin similarity from board co-occurrence

Pins that appear together on boards are similar. Self-join pin_board on board_id, where pin_a.id < pin_b.id. Group by (pin_a, pin_b), count co-occurrences. Discuss why this is the cheap proxy for graph-based similarity, why it has noise from over-broad boards, and how Pinterest improves on it (board-quality weighting, content-based features).

SQL · L5

Attribute conversions to ad impressions within 28-day window

Self-join impression_events to conversion_events on user_id. WHERE conversion.ts BETWEEN impression.ts AND impression.ts + INTERVAL '28 days'. For multi-touch attribution, compute per-impression weight (last-touch: closest impression gets credit; linear: split equally; time-decay: weight by recency). Discuss how the choice affects advertiser incentives.

SQL · L5

Detect spam at scale: pinners with abnormal behavior

Rolling-window stats per pinner: pins per hour, repins per hour, follows per hour. Flag pinners with values > 3 std-dev above their cohort baseline. Discuss the bias: new pinners and viral content creators can produce false positives. Layer in: account age, content quality scores, IP-based clustering.

Python · L4

Find related pins via shared-board graph traversal

Build adjacency dict: for each pin, the set of boards it appears on. For each board, the set of pins. To find related to pin P: get P's boards, for each board get the pins, count overlap, return top N. O(N * avg_pins) per pin where N is pin count. Discuss why this won't scale to billions; mention precomputed neighbor table or embedding-based similarity for production.

def related_pins(pin_id: str, top_k: int = 50) -> list[tuple[str, int]]:
    boards = pin_to_boards[pin_id]
    candidates: dict[str, int] = {}
    for board_id in boards:
        for other_pin in board_to_pins[board_id]:
            if other_pin == pin_id:
                continue
            candidates[other_pin] = candidates.get(other_pin, 0) + 1
    # Top-k by co-occurrence count
    return sorted(candidates.items(), key=lambda x: -x[1])[:top_k]

Python · L4

Sessionize pin-engagement events with 30-min idle gap

Sort events by (user_id, ts). Walk events. Increment session_id when gap > 30 min OR user changes. State assumption: events with same ts are same session. Edge case: outbound-click events that bounce the user out of the app may not have a clear "session end" signal; document the choice.

Python · L5

Compute point-in-time features for training data

Given user click events as training labels and feature log as feature source, compute features as_of label_ts. Use pandas merge_asof or PySpark equivalent. Critical: every feature must have feature_ts <= label_ts to prevent leakage. Discuss why this matters: a model trained on leaked features looks great offline and breaks in production.

Python · L5

Implement ad-attribution with multi-touch weighting

Given conversion event and N impressions in the 28-day window, attribute credit per impression. Implement last-touch (1.0 to most recent), linear (1/N each), and time-decay (exponential weight by recency). Walk through a concrete 4-impression example with each method. Discuss the business trade-off: time-decay is more "fair" but harder for advertisers to reason about.

System Design · L5

Design the home feed recommendation feature pipeline

User events (clicks, saves, impressions) -> Kafka -> Flink (real-time features: last-N-clicked-categories, current-session signals, fresh interaction counts) -> Redis (online store, p99 < 10ms reads). Spark daily batch features (lifetime topic affinity, board diversity) -> S3 feature parquet -> registered in feature catalog (Galaxy). Online inference: ranker pulls features from Redis + lookup cache, calls ML model. Cover point-in-time correctness for training data, A/B test instrumentation, schema evolution as new features ship weekly.

User events -> Kafka (engagement_events topic, key=user_id)
   -> Flink (real-time features, RocksDB state, EXACTLY_ONCE)
        -> Redis (online store, 30-day TTL, p99 < 10ms)
        -> S3 feature log (immutable, event-time partitioned)
   Spark daily batch features:
        S3 events -> Spark -> S3 feature parquet
        -> Iceberg table for query
        -> registered in Galaxy feature catalog

Training data:
   Spark as_of_join between labels (next-day clicks) and feature
   log, joined by (user_id, event_ts) where feature_ts <= label_ts.
   Produces leak-free training data.

Online inference:
   Ranker service reads features from Redis by user_id.
   On Redis miss: fall back to default vectors.
   Drift monitor: daily PSI / KS-test on feature distributions.

System Design · L5

Design the ad attribution pipeline with 28-day click window

Two-track architecture. Real-time path: impressions and conversions to Kafka, Flink keyed by user_id maintains 28-day state of impressions, on each conversion emits attributed-impression record. Batch path: daily Spark job joins impressions and conversions across the same window, produces source-of-truth fact_attribution. Daily delta report comparing real-time to batch, alerting on drift > 0.5%. Cover: state size estimate (28 days * impression rate), key skew (whale users with heavy impression history), schema evolution as new ad formats ship.

System Design · L5

Design the trust-and-safety signal aggregation system

Reports of bad content (from users, automated detectors, external feeds) -> Kafka -> Flink (deduplicate, score, aggregate per content_id) -> serves to moderation console + automated takedown thresholds. Cover: signal weighting (user reports vs ML classifier vs external feed), audit log immutability, false-positive review workflow, appeals handling.

Modeling · L5

Design the schema for pin, board, user, topic graph

Three core fact tables. fact_pin: one row per pin, with pin_id, creator_user_id, created_ts, image_url, content_hash. fact_pin_board: one row per (pin_id, board_id), with added_ts, added_by_user_id (for repins). fact_board: one row per board, with board_id, owner_user_id, created_ts, topic_id. fact_user_topic_affinity: derived, one row per (user_id, topic_id, day) with affinity_score. Discuss: graph queries (related pins, related boards) are served from precomputed aggregate tables, not from joining the raw graph at query time.

Modeling · L5

Migrate from Hive to Iceberg without breaking downstream

Pinterest's real 2023 to 2024 migration. Approach: dual-write to Hive and Iceberg for 90 days, validate row counts and column-level checksums daily, switch readers to Iceberg one consumer at a time, deprecate Hive after all consumers cut over. Cover: schema differences (Iceberg hidden partitioning vs Hive explicit), file format choice (Parquet for both), file compaction strategy (Iceberg rewrite_data_files), and the cost spike during dual-write period.

Behavioral · L5

Tell me about a time you chose a simpler model over an elegant one

Pinterest culture rewards pragmatic decisions in ambiguous product contexts. Story should cover: the choice you faced, why the elegant option was tempting, why the simpler option was right (often: faster to ship, easier to debug, easier for the team to maintain), what the outcome was, and what you would tell someone facing the same choice. Decision postmortem essential.

What Makes Pinterest Data Engineer Interviews Different

Graph data shapes every system

Pinterest is a graph: pins on boards, boards owned by users, users following users and topics. Every system design and modeling answer should acknowledge the graph shape, even when the storage is relational. Frame queries as graph traversals, then explain the precomputed-aggregate optimization.

Ad attribution at scale is the platform tax

Pinterest's revenue is ads. Attribution is the system that proves ads worked. Every ad-related interview question implicitly tests whether you understand 28-day click windows, last-touch vs multi-touch, view-through attribution, and the privacy-driven shift away from third-party cookies.

Modern lakehouse + legacy Hive coexistence

Pinterest's migration to Iceberg is real but incomplete. Some teams run pure Iceberg, others still on Hive. Your answer should be Iceberg-first but acknowledge the Hive-still-exists reality. Knowing the migration story is a plus.

ML platform questions show up in non-ML teams

Pinterest's feature store (Galaxy) and training infra are used across teams, including teams not formally on ML Platform. If you're interviewing for Home Feed or Ads, expect at least one feature-store question even if it's not your primary skill.

How Pinterest Connects to the Rest of Your Prep

Pinterest overlaps with Instacart Data Engineer interview process and questions on the learning-to-rank and feature-store patterns, with Twitter Data Engineer interview process and questions on graph data modeling, and with Netflix Data Engineer interview process and questions on the recommendation-pipeline architecture.

If you're targeting an ML platform role, also see the ML data engineer interview prep guide. The streaming feature work overlaps with the real-time Data Engineer interview prep guide. Drill the rounds in data pipeline system design interview prep and schema design interview walkthrough for the feature store and graph design patterns.

Prepare for the interview

01 / Open invite

02min.

Walk into Pinterest knowing the Python pattern they'll test.

a Pinterest Python query, the same shape a screen would give you.

The diff against expected. Where ties broke. What you missed.

sandbox

1def sessionize(events):

2 sessions = []

3 for e in events:

4 if gap_minutes(e) > 30:

Execute your solution0.4s avg.

PinterestInterview question

Solve a Pinterest problem

Pinterest Interview FAQ

How long does Pinterest's Data Engineer interview take?+

4 to 5 weeks from recruiter screen to offer.

Is Pinterest remote-friendly?+

Hybrid. Most teams allow 2 to 3 days remote, with 2 days in San Francisco, Seattle, or Mexico City offices.

What level should I target?+

IC3 (Senior) is the most common external hiring level. IC4+ usually internal promotion.

Does Pinterest test algorithms / LeetCode?+

Lightly. Focus on graph traversal, feature engineering, and PySpark transformations. Don't grind LeetCode for Pinterest; spend the time on data engineering patterns.

How important is recommendation systems knowledge?+

Critical for Home Feed, Search, Ads. Less critical for Trust and Safety or Analytics Engineering. Ask the recruiter which team and tailor accordingly.

What languages can I use?+

Python and SQL universally. Scala for Spark-heavy roles.

Is the Iceberg migration over?+

About 70% complete as of 2026. Knowing both Iceberg and Hive trade-offs is the right preparation; mentioning the migration story shows research.

How does the Pinterest behavioral round compare to other companies?+

Less rigid than Amazon's Leadership Principles, less keeper-test culture than Netflix. Pragmatic decision-making is the central theme. Stories about choosing the simpler approach over the elegant one land especially well.

02 / Why practice

Practice Recommendation Pipelines and Graph Modeling

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Parsing and reshaping, sessionization, dedup with tie-breaks, streaming aggregation, top-N-per-group. Writing them by hand turns the unfamiliar into pattern recognition

Start Practicing

More data engineer interview prep guides

Stripe Data Engineer interview process and questions→

Stripe Data Engineer process, comp, financial-precision SQL, and the collaboration round.

Uber Data Engineer interview process and questions→

Uber Data Engineer process, marketplace and surge data modeling, geospatial pipelines.

Airbnb Data Engineer interview process and questions→

Airbnb Data Engineer process, experimentation platform questions, two-sided marketplace modeling.

Databricks Data Engineer interview process and questions→

Databricks Data Engineer process, Spark internals, lakehouse architecture, Delta Lake questions.

Snowflake Data Engineer interview process and questions→

Snowflake Data Engineer process, micro-partitions, query optimization, warehouse architecture.

Netflix Data Engineer interview process and questions→

Netflix Data Engineer process, streaming pipelines, A/B test infra, and the keeper test.

Pinterest Data Engineer Interview

Pinterest Data Engineer Interview Process

Recruiter Screen (30 min)

Technical Phone Screen (60 min)

System Design Round (60 min)

Live Coding Onsite (60 min)

ML Platform Round (60 min, ML-adjacent teams only)

Behavioral Round (60 min)

Pinterest data engineer compensation

The Pinterest data stack

15 Real Pinterest Data Engineer Interview Questions With Worked Answers

Compute pin engagement funnel: impression to click to save to outbound click

Find boards with the highest engagement growth this week

Compute pin similarity from board co-occurrence

Attribute conversions to ad impressions within 28-day window

Detect spam at scale: pinners with abnormal behavior

Find related pins via shared-board graph traversal

Sessionize pin-engagement events with 30-min idle gap

Compute point-in-time features for training data

Implement ad-attribution with multi-touch weighting

Design the home feed recommendation feature pipeline

Design the ad attribution pipeline with 28-day click window

Design the trust-and-safety signal aggregation system

Design the schema for pin, board, user, topic graph

Migrate from Hive to Iceberg without breaking downstream

Tell me about a time you chose a simpler model over an elegant one

What Makes Pinterest Data Engineer Interviews Different

Graph data shapes every system

Ad attribution at scale is the platform tax

Modern lakehouse + legacy Hive coexistence

ML platform questions show up in non-ML teams

How Pinterest Connects to the Rest of Your Prep

Walk into Pinterest knowing the Python pattern they'll test.

Pinterest Interview FAQ

Practice Recommendation Pipelines and Graph Modeling

More data engineer interview prep reading

More data engineer interview prep guides