Spotify processes billions of streaming events daily to power personalized recommendations, Wrapped campaigns, and royalty payments. Their DE interviews focus on event-driven architecture, GCP/BigQuery expertise, and the autonomous engineering culture that defines Spotify squads.
Interview timeline: 3 to 5 weeks | Levels: L1 through L4+ | Total comp: $130K to $480K
Three stages from recruiter call to offer, typically completed in 3 to 5 weeks.
Initial conversation about your experience and motivation for joining Spotify. The recruiter evaluates your background with event-driven data systems and your interest in music, podcasts, or media technology. Spotify's data platform team handles billions of events daily from streaming, search, and ad interactions. They look for candidates who care about both technical excellence and product impact.
SQL and Python problems set in a music streaming context. Expect questions about user engagement metrics, playlist analytics, and event processing. Spotify values clean, readable code and clear communication of your approach. The interviewer also evaluates how you think about data quality in event streams.
Four rounds covering system design, SQL deep dive, coding, and a values interview. System design questions at Spotify involve recommendation pipelines, event processing at scale, and data platform architecture. The values interview evaluates collaboration, innovation, and alignment with Spotify's band manifesto. Each interviewer provides independent feedback.
Total compensation by level for US-based roles. Spotify is publicly traded (NYSE: SPOT), so equity is granted as RSUs. Comp is competitive but typically below FAANG peers at equivalent levels.
Stockholm HQ roles use a different structure reflecting Swedish market norms, benefits, and pension contributions. Base salary + equity + bonus.
Entry-level roles for candidates with 0 to 2 years of experience. Comp is base-heavy with a smaller equity component. Most L1 hires are located in the US or Stockholm.
The most common external hire level. Balanced split between base salary, RSUs, and annual bonus. Spotify RSUs vest on a standard four-year schedule with a one-year cliff.
Senior engineers own end-to-end pipeline design within their squad. Equity becomes a larger portion of total comp. Senior candidates face deeper system design rounds in the interview.
Staff engineers drive technical direction across multiple squads or an entire tribe. Equity is the dominant comp component. These roles require demonstrated cross-team impact and technical leadership.
Spotify runs entirely on Google Cloud. Their data stack is built around Apache Beam for processing, BigQuery for analytics, and Kafka for event streaming.
Spotify organizes into squads (small cross-functional teams), tribes (groups of related squads), chapters (skill-based groups), and guilds (interest communities). Data engineers are embedded in squads, not centralized.
These are the primary areas where data engineers work. Each squad operates autonomously with its own roadmap and technical decisions.
Core infrastructure, data quality frameworks, governance tooling, and the internal developer experience layer built on Backstage.
ML feature pipelines for Discover Weekly, Daily Mix, Release Radar, and real-time recommendation serving.
Music and podcast metadata pipelines, rights management data, and content ingestion from labels and distributors.
Programmatic ad serving pipelines, impression tracking, measurement attribution, and advertiser analytics.
Spotify for Artists analytics, streaming metrics dashboards, and audience insight pipelines for creators.
Speech-to-text processing, content classification, podcast transcription, and audio feature extraction pipelines.
Real question types from each round. The guidance shows what the interviewer evaluates and how to structure your answer.
Filter stream_events where play_duration >= 30. Count DISTINCT user_id per song_id. ORDER BY unique_listeners DESC LIMIT 10. Discuss why 30 seconds is the industry threshold for a 'play' and how to handle repeated plays.
Define skip as play_duration < 15 AND user_action = 'skip'. Group by genre, compute skips / total_plays. Discuss whether autoplay skips should count differently than manual skips.
Use conditional aggregation across multiple event types. Normalize each metric (0 to 1), then weighted average. Discuss how to handle new users with sparse data and whether to use percentile-based normalization.
Join playlist_follow_events grouped by playlist_id and week. Compare follow vs unfollow counts per week using conditional aggregation. Use a window function to track the trend across weeks. Discuss how to surface this to playlist curators via Spotify for Artists.
Read from source, deduplicate using a set or merge key, join to track dimension, group by track_id and date, write partitioned output. Discuss idempotency and how to handle late-arriving events in the next day's partition.
Process podcast_events to classify each listen into funnel stages based on percent_completed. Aggregate per episode and per show. Discuss how to handle users who resume episodes across sessions and how to avoid double-counting restarts.
Year-long event aggregation from streaming events. Pre-compute per-user summaries (top artists, genres, minutes listened) incrementally. Discuss the burst of reads on launch day, caching strategy, and how to handle users who listen on multiple devices.
Kafka for event ingestion, feature store for user profiles, ML model serving for recommendations. Discuss cold-start problem for new users, feedback loops (user skips recommended songs), and latency requirements for real-time updates.
Pub/Sub for event ingestion, BigQuery for warehousing, real-time aggregation for campaign dashboards. Discuss deduplication of click events, attribution windows, and how to reconcile real-time counts with batch-validated totals for billing.
Fact: stream_events (user_id, track_id, duration, timestamp, context). Dimensions: tracks, artists, albums, playlists. Discuss the dual purpose: anonymized aggregates for ML features vs precise per-play records for financial reporting. Rights ownership can be complex (multiple writers, labels).
Start with partition pruning (date partitions), then clustering (user_id or event_type). Flatten only the nested fields you need with UNNEST instead of SELECT *. Consider materialized views for repeated aggregations. Discuss slot reservation vs on-demand pricing trade-offs.
Show proactive engineering: identified the scaling bottleneck before it caused outages. Describe the investigation, the solution, and the measured improvement. Spotify values engineers who improve systems without being asked.
Spotify's data engineering culture is distinct from other large tech companies. Understanding these differences will shape how you answer every interview question.
Few companies have contributed two major open source projects to the data and developer tools ecosystem. Backstage (developer portals) is now a CNCF project used by hundreds of companies. Luigi was one of the first Python-based workflow orchestrators, preceding Airflow. This engineering culture of building tools and sharing them externally is core to Spotify's identity.
Spotify organizes into squads (small cross-functional teams), tribes (groups of related squads), chapters (skill-based communities across squads), and guilds (interest-based communities across the company). Data engineers are embedded in squads, not centralized. You own your pipelines end-to-end and make architectural decisions locally.
While most large tech companies run on AWS, Spotify migrated fully to Google Cloud. BigQuery is the primary analytical warehouse. Apache Beam (via Dataflow and Scio) is the processing framework. This GCP-native stack means your system design answers should reference Google services, not AWS equivalents.
Every user action (play, skip, search, save, share) generates an event that flows through Kafka and Pub/Sub into processing pipelines. The event-driven architecture is not just for analytics; it powers real-time personalization, ad targeting, and content recommendations. Batch processing exists, but the event stream is the source of truth.
Patterns that cause strong candidates to underperform in Spotify interviews.
Spotify's engineering culture is built on squad autonomy, not top-down mandates. Your answers should reflect independent decision-making within a collaborative team, not hierarchical escalation.
Nearly every Spotify system generates and consumes events. If your system design uses only batch ETL with no event layer, you are missing the core architectural pattern Spotify relies on.
Spotify runs on GCP. Use BigQuery (not Redshift), Pub/Sub (not SQS/SNS), Dataflow (not EMR), and GCS (not S3). This shows you have researched the company and can hit the ground running.
The values round is not a throwaway. Spotify has rejected strong technical candidates who could not demonstrate alignment with the band manifesto. Prepare specific stories about innovation, sincerity, and collaboration.
Spotify created both of these widely-used open source projects. Backstage is now a CNCF project for developer portals. Luigi was an early Python workflow orchestrator. Knowing their origin shows genuine interest.
Targeted strategies to stand out in each interview round.
Everything at Spotify generates events: plays, skips, searches, playlist edits, ad impressions. Know event-driven patterns: event sourcing, pub/sub messaging, and how to build reliable pipelines on top of event streams. This is the most common system design context.
Spotify migrated from on-premises Hadoop to Google Cloud. BigQuery is their primary analytics warehouse. Know BigQuery-specific features: nested and repeated fields (STRUCT, ARRAY), UNNEST, partitioned tables, and materialized views. This context helps in both SQL and system design rounds.
Backstage is Spotify's developer portal for managing microservices, data pipelines, and documentation. Understanding Backstage shows you have researched Spotify's engineering culture and care about developer experience, which is a core value.
Spotify organizes into autonomous squads. Data engineers are embedded in squads rather than centralized. Prepare examples of working independently within a team, making local decisions, and collaborating across team boundaries.
Spotify DE interviews test event-driven thinking and GCP expertise. Practice problems that mirror streaming data scenarios, BigQuery optimization, and real-time pipeline design.
Practice Spotify-Level SQLContinue your prep
50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.