Company Interview Guide

Pinterest Data Engineer Interview

Pinterest serves a discovery feed to 500M+ monthly active users on top of a graph of pins, boards, users, and topics. The data engineer interview reflects this: heavy emphasis on recommendation feature pipelines, ad attribution at scale, and graph-shaped data modeling. Pinterest moved aggressively to PySpark + Iceberg + Druid in 2023 to 2024, and the interview now tests modern lakehouse patterns alongside the older Hive workload they still operate. Loops run 4 to 5 weeks. Pair with the the full data engineer interview playbook.

The Short Answer
Expect a 5 to 6 round loop: recruiter screen, technical phone screen (SQL or Python), then a 4 to 5 round virtual onsite covering system design (often a recommendation feature pipeline or ad attribution system), SQL, Python, an ML platform round (for ML-adjacent teams), and behavioral. Pinterest's distinctive emphasis: graph data modeling for the pin-board-user-topic structure, ad attribution windows that span 28 days post-click, and the move from Hive to Iceberg as the storage default. The behavioral round leans on stories about pragmatic decisions in ambiguous product contexts.
Updated April 2026·By The DataDriven Team

Pinterest Data Engineer Interview Process

5 to 6 rounds, 4 to 5 weeks. Mostly virtual.

1

Recruiter Screen (30 min)

Conversational call. Pinterest hires across Home Feed, Search, Ads, Trust and Safety, Creator Tools, ML Platform, Analytics Engineering. Each team has its own data character: Home Feed leans recommendation features, Ads leans attribution and reporting, Trust and Safety leans graph and behavioral signal pipelines. Mention experience with recommendation systems, ad tech, or graph data if you have it.
2

Technical Phone Screen (60 min)

Live SQL or Python in CoderPad. SQL leans on funnel analytics (impression to click to save to outbound click) and rolling-window aggregations. Python leans on graph traversal (find related pins via shared boards) and feature engineering with PySpark.
3

System Design Round (60 min)

Common: design the home feed recommendation feature pipeline, design the ad attribution pipeline with 28-day click window, design the trust and safety signal aggregation system. Use the 4-step framework. Cover real-time + batch dual-track for features, point-in-time correctness for training data, schema evolution as model features churn weekly.
4

Live Coding Onsite (60 min)

Second live coding round, opposite language from phone screen. Often includes a follow-up that adds a graph traversal or feature engineering component.
5

ML Platform Round (60 min, ML-adjacent teams only)

Feature stores, training data pipelines, online vs offline features, point-in-time correctness, A/B test instrumentation. Asked of candidates targeting Home Feed, Search, Ads, or ML Platform teams. Skipped for Analytics Engineering or Trust and Safety roles.
6

Behavioral Round (60 min)

STAR-D format. Pinterest values pragmatic decisions in product-ambiguous contexts. Stories about cutting scope to ship, choosing the simpler model over the elegant one, and influencing PMs on metric definitions all score well. Decision postmortem heavily weighted.

Pinterest Data Engineer Compensation (2026)

Total comp from levels.fyi and verified offers. US-based.

LevelTitleRangeNotes
IC2Data Engineer$170K - $250K2-4 years exp. Owns individual pipelines, on-call rotation.
IC3Senior Data Engineer$240K - $370KMost common hiring level. Cross-team systems, architecture decisions.
IC4Staff Data Engineer$330K - $500KSets technical direction for a domain. Cross-org influence.
IC5Senior Staff Data Engineer$430K - $620KMulti-org technical leadership. Internal promo typical.

Pinterest Data Engineering Tech Stack

Languages

Python (heavy), Scala, Java, SQL

Processing

Apache Spark (PySpark), Apache Flink for streaming

Storage

S3, Apache Iceberg (primary lakehouse), legacy Hive on HDFS, Snowflake for analytics

Streaming

Apache Kafka (heavy), MemQ (Pinterest's in-house pub-sub for high throughput)

Query Engines

Apache Druid for real-time analytics, Presto/Trino for ad-hoc, Snowflake for finance

Orchestration

Apache Airflow (heavy use, hundreds of DAGs)

ML Platform

Custom feature store (Galaxy), TensorFlow, PyTorch, in-house training and serving infra

Graph

Custom in-house graph storage and traversal engine for pin-board-user-topic relationships

15 Real Pinterest Data Engineer Interview Questions With Worked Answers

Questions reported by candidates in 2024-2026 loops, paraphrased and de-identified. Each answer covers the approach, the gotcha, and the typical follow-up.

SQL · L4

Compute pin engagement funnel: impression to click to save to outbound click

Funnel SQL with conditional aggregation per stage. SUM CASE WHEN event_type = 'impression' THEN 1 END AS impressions, similar for click, save, outbound. Group by date or pin_id. Volunteer the de-duplication consideration: a single user impression-then-click should count once, not twice. The follow-up: how do you handle the user who saw the same pin 3 times before clicking? Answer: dedup by user_id and pin_id within the attribution window.
SQL · L4

Find boards with the highest engagement growth this week

Aggregate engagement events per board per week. LAG to previous week. Compute (current - prior) / NULLIF(prior, 0). Order by growth pct desc. Volunteer that new boards have artificially high growth (zero baseline); filter to boards with at least N engagements in trailing window.
SQL · L5

Compute pin similarity from board co-occurrence

Pins that appear together on boards are similar. Self-join pin_board on board_id, where pin_a.id < pin_b.id. Group by (pin_a, pin_b), count co-occurrences. Discuss why this is the cheap proxy for graph-based similarity, why it has noise from over-broad boards, and how Pinterest improves on it (board-quality weighting, content-based features).
SQL · L5

Attribute conversions to ad impressions within 28-day window

Self-join impression_events to conversion_events on user_id. WHERE conversion.ts BETWEEN impression.ts AND impression.ts + INTERVAL '28 days'. For multi-touch attribution, compute per-impression weight (last-touch: closest impression gets credit; linear: split equally; time-decay: weight by recency). Discuss how the choice affects advertiser incentives.
SQL · L5

Detect spam at scale: pinners with abnormal behavior

Rolling-window stats per pinner: pins per hour, repins per hour, follows per hour. Flag pinners with values > 3 std-dev above their cohort baseline. Discuss the bias: new pinners and viral content creators can produce false positives. Layer in: account age, content quality scores, IP-based clustering.
Python · L4

Find related pins via shared-board graph traversal

Build adjacency dict: for each pin, the set of boards it appears on. For each board, the set of pins. To find related to pin P: get P's boards, for each board get the pins, count overlap, return top N. O(N * avg_pins) per pin where N is pin count. Discuss why this won't scale to billions; mention precomputed neighbor table or embedding-based similarity for production.
def related_pins(pin_id: str, top_k: int = 50) -> list[tuple[str, int]]:
    boards = pin_to_boards[pin_id]
    candidates: dict[str, int] = {}
    for board_id in boards:
        for other_pin in board_to_pins[board_id]:
            if other_pin == pin_id:
                continue
            candidates[other_pin] = candidates.get(other_pin, 0) + 1
    # Top-k by co-occurrence count
    return sorted(candidates.items(), key=lambda x: -x[1])[:top_k]
Python · L4

Sessionize pin-engagement events with 30-min idle gap

Sort events by (user_id, ts). Walk events. Increment session_id when gap > 30 min OR user changes. State assumption: events with same ts are same session. Edge case: outbound-click events that bounce the user out of the app may not have a clear “session end” signal; document the choice.
Python · L5

Compute point-in-time features for training data

Given user click events as training labels and feature log as feature source, compute features as_of label_ts. Use pandas merge_asof or PySpark equivalent. Critical: every feature must have feature_ts <= label_ts to prevent leakage. Discuss why this matters: a model trained on leaked features looks great offline and breaks in production.
Python · L5

Implement ad-attribution with multi-touch weighting

Given conversion event and N impressions in the 28-day window, attribute credit per impression. Implement last-touch (1.0 to most recent), linear (1/N each), and time-decay (exponential weight by recency). Walk through a concrete 4-impression example with each method. Discuss the business trade-off: time-decay is more “fair” but harder for advertisers to reason about.
System Design · L5

Design the home feed recommendation feature pipeline

User events (clicks, saves, impressions) -> Kafka -> Flink (real-time features: last-N-clicked-categories, current-session signals, fresh interaction counts) -> Redis (online store, p99 < 10ms reads). Spark daily batch features (lifetime topic affinity, board diversity) -> S3 feature parquet -> registered in feature catalog (Galaxy). Online inference: ranker pulls features from Redis + lookup cache, calls ML model. Cover point-in-time correctness for training data, A/B test instrumentation, schema evolution as new features ship weekly.
User events -> Kafka (engagement_events topic, key=user_id)
   -> Flink (real-time features, RocksDB state, EXACTLY_ONCE)
        -> Redis (online store, 30-day TTL, p99 < 10ms)
        -> S3 feature log (immutable, event-time partitioned)
   Spark daily batch features:
        S3 events -> Spark -> S3 feature parquet
        -> Iceberg table for query
        -> registered in Galaxy feature catalog

Training data:
   Spark as_of_join between labels (next-day clicks) and feature
   log, joined by (user_id, event_ts) where feature_ts <= label_ts.
   Produces leak-free training data.

Online inference:
   Ranker service reads features from Redis by user_id.
   On Redis miss: fall back to default vectors.
   Drift monitor: daily PSI / KS-test on feature distributions.
System Design · L5

Design the ad attribution pipeline with 28-day click window

Two-track architecture. Real-time path: impressions and conversions to Kafka, Flink keyed by user_id maintains 28-day state of impressions, on each conversion emits attributed-impression record. Batch path: daily Spark job joins impressions and conversions across the same window, produces source-of-truth fact_attribution. Daily delta report comparing real-time to batch, alerting on drift > 0.5%. Cover: state size estimate (28 days * impression rate), key skew (whale users with heavy impression history), schema evolution as new ad formats ship.
System Design · L5

Design the trust-and-safety signal aggregation system

Reports of bad content (from users, automated detectors, external feeds) -> Kafka -> Flink (deduplicate, score, aggregate per content_id) -> serves to moderation console + automated takedown thresholds. Cover: signal weighting (user reports vs ML classifier vs external feed), audit log immutability, false-positive review workflow, appeals handling.
Modeling · L5

Design the schema for pin, board, user, topic graph

Three core fact tables. fact_pin: one row per pin, with pin_id, creator_user_id, created_ts, image_url, content_hash. fact_pin_board: one row per (pin_id, board_id), with added_ts, added_by_user_id (for repins). fact_board: one row per board, with board_id, owner_user_id, created_ts, topic_id. fact_user_topic_affinity: derived, one row per (user_id, topic_id, day) with affinity_score. Discuss: graph queries (related pins, related boards) are served from precomputed aggregate tables, not from joining the raw graph at query time.
Modeling · L5

Migrate from Hive to Iceberg without breaking downstream

Pinterest's real 2023 to 2024 migration. Approach: dual-write to Hive and Iceberg for 90 days, validate row counts and column-level checksums daily, switch readers to Iceberg one consumer at a time, deprecate Hive after all consumers cut over. Cover: schema differences (Iceberg hidden partitioning vs Hive explicit), file format choice (Parquet for both), file compaction strategy (Iceberg rewrite_data_files), and the cost spike during dual-write period.
Behavioral · L5

Tell me about a time you chose a simpler model over an elegant one

Pinterest culture rewards pragmatic decisions in ambiguous product contexts. Story should cover: the choice you faced, why the elegant option was tempting, why the simpler option was right (often: faster to ship, easier to debug, easier for the team to maintain), what the outcome was, and what you would tell someone facing the same choice. Decision postmortem essential.

What Makes Pinterest Data Engineer Interviews Different

Graph data shapes every system

Pinterest is a graph: pins on boards, boards owned by users, users following users and topics. Every system design and modeling answer should acknowledge the graph shape, even when the storage is relational. Frame queries as graph traversals, then explain the precomputed-aggregate optimization.

Ad attribution at scale is the platform tax

Pinterest's revenue is ads. Attribution is the system that proves ads worked. Every ad-related interview question implicitly tests whether you understand 28-day click windows, last-touch vs multi-touch, view-through attribution, and the privacy-driven shift away from third-party cookies.

Modern lakehouse + legacy Hive coexistence

Pinterest's migration to Iceberg is real but incomplete. Some teams run pure Iceberg, others still on Hive. Your answer should be Iceberg-first but acknowledge the Hive-still-exists reality. Knowing the migration story is a plus.

ML platform questions show up in non-ML teams

Pinterest's feature store (Galaxy) and training infra are used across teams, including teams not formally on ML Platform. If you're interviewing for Home Feed or Ads, expect at least one feature-store question even if it's not your primary skill.

How Pinterest Connects to the Rest of Your Prep

Pinterest overlaps with Instacart Data Engineer interview process and questions on the learning-to-rank and feature-store patterns, with Twitter Data Engineer interview process and questions on graph data modeling, and with Netflix Data Engineer interview process and questions on the recommendation-pipeline architecture.

If you're targeting an ML platform role, also see the ML data engineer interview prep guide. The streaming feature work overlaps with the real-time Data Engineer interview prep guide. Drill the rounds in data pipeline system design interview prep and schema design interview walkthrough for the feature store and graph design patterns.

Data Engineer Interview Prep FAQ

How long does Pinterest's Data Engineer interview take?+
4 to 5 weeks from recruiter screen to offer.
Is Pinterest remote-friendly?+
Hybrid. Most teams allow 2 to 3 days remote, with 2 days in San Francisco, Seattle, or Mexico City offices.
What level should I target?+
IC3 (Senior) is the most common external hiring level. IC4+ usually internal promotion.
Does Pinterest test algorithms / LeetCode?+
Lightly. Focus on graph traversal, feature engineering, and PySpark transformations. Don't grind LeetCode for Pinterest; spend the time on data engineering patterns.
How important is recommendation systems knowledge?+
Critical for Home Feed, Search, Ads. Less critical for Trust and Safety or Analytics Engineering. Ask the recruiter which team and tailor accordingly.
What languages can I use?+
Python and SQL universally. Scala for Spark-heavy roles.
Is the Iceberg migration over?+
About 70% complete as of 2026. Knowing both Iceberg and Hive trade-offs is the right preparation; mentioning the migration story shows research.
How does the Pinterest behavioral round compare to other companies?+
Less rigid than Amazon's Leadership Principles, less keeper-test culture than Netflix. Pragmatic decision-making is the central theme. Stories about choosing the simpler approach over the elegant one land especially well.

Practice Recommendation Pipelines and Graph Modeling

Drill the feature store, ad attribution, and graph traversal patterns that win the Pinterest data engineer loop.

Start Practicing

More Data Engineer Interview Prep Guides

Continue your prep

Data Engineer Interview Prep, explore the full guide

50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.

Interview Rounds

By Company

By Role

By Technology

Decisions

Question Formats