Company Interview Guide

Spotify Data Engineer Interview

Spotify processes billions of streaming events daily to power personalized recommendations, Wrapped campaigns, and royalty payments. Their DE interviews focus on event-driven architecture, GCP/BigQuery expertise, and the autonomous engineering culture that defines Spotify squads.

Interview timeline: 3 to 5 weeks | Levels: L1 through L4+ | Total comp: $130K to $480K

Spotify DE Interview Process

Three stages from recruiter call to offer, typically completed in 3 to 5 weeks.

1

Recruiter Screen

30 min

Initial conversation about your experience and motivation for joining Spotify. The recruiter evaluates your background with event-driven data systems and your interest in music, podcasts, or media technology. Spotify's data platform team handles billions of events daily from streaming, search, and ad interactions. They look for candidates who care about both technical excellence and product impact.

*Show genuine interest in how data powers music recommendations and personalization
*Mention GCP/BigQuery experience if you have it; Spotify runs primarily on Google Cloud
*Ask about the squad structure; Spotify uses an autonomous squad model with embedded DEs
2

Technical Phone Screen

60 min

SQL and Python problems set in a music streaming context. Expect questions about user engagement metrics, playlist analytics, and event processing. Spotify values clean, readable code and clear communication of your approach. The interviewer also evaluates how you think about data quality in event streams.

*Practice SQL with event-stream data: sessionization, funnel analysis, and engagement metrics
*Be ready for Python questions around data transformation and pipeline logic
*Spotify uses BigQuery; familiarity with its SQL dialect (UNNEST, STRUCT, ARRAY) is helpful
3

Onsite Loop

4 to 5 hours

Four rounds covering system design, SQL deep dive, coding, and a values interview. System design questions at Spotify involve recommendation pipelines, event processing at scale, and data platform architecture. The values interview evaluates collaboration, innovation, and alignment with Spotify's band manifesto. Each interviewer provides independent feedback.

*Know event-driven architecture patterns: event sourcing, CQRS, pub/sub
*Spotify created Backstage for developer experience; mentioning it shows research
*The values round tests genuine collaboration, not just conflict resolution stories

Spotify Data Engineer Compensation (2026)

Total compensation by level for US-based roles. Spotify is publicly traded (NYSE: SPOT), so equity is granted as RSUs. Comp is competitive but typically below FAANG peers at equivalent levels.

Stockholm HQ roles use a different structure reflecting Swedish market norms, benefits, and pension contributions. Base salary + equity + bonus.

L1 (Junior)

$130K to $180K

Entry-level roles for candidates with 0 to 2 years of experience. Comp is base-heavy with a smaller equity component. Most L1 hires are located in the US or Stockholm.

L2 (Mid)

$170K to $260K

The most common external hire level. Balanced split between base salary, RSUs, and annual bonus. Spotify RSUs vest on a standard four-year schedule with a one-year cliff.

L3 (Senior)

$240K to $370K

Senior engineers own end-to-end pipeline design within their squad. Equity becomes a larger portion of total comp. Senior candidates face deeper system design rounds in the interview.

L4 (Staff)

$340K to $480K

Staff engineers drive technical direction across multiple squads or an entire tribe. Equity is the dominant comp component. These roles require demonstrated cross-team impact and technical leadership.

Spotify Data Engineering Tech Stack

Spotify runs entirely on Google Cloud. Their data stack is built around Apache Beam for processing, BigQuery for analytics, and Kafka for event streaming.

LanguagesPython, Java, Scala
CloudGoogle Cloud Platform (GCP), BigQuery
Core FrameworksApache Beam, Google Dataflow, Scio (Spotify's Scala wrapper for Apache Beam)
StorageGoogle Cloud Storage, BigQuery, Bigtable
StreamingApache Kafka, Google Pub/Sub
OrchestrationLuigi (Spotify created it, now mostly Airflow), Flyte
ML InfraInternal Metaflow-like tooling, TensorFlow, Kubeflow
Developer ExperienceBackstage (Spotify created it, now a CNCF project)

Data Engineering Teams at Spotify

Spotify organizes into squads (small cross-functional teams), tribes (groups of related squads), chapters (skill-based groups), and guilds (interest communities). Data engineers are embedded in squads, not centralized.

These are the primary areas where data engineers work. Each squad operates autonomously with its own roadmap and technical decisions.

Data Platform

Core infrastructure, data quality frameworks, governance tooling, and the internal developer experience layer built on Backstage.

Personalization & Recommendations

ML feature pipelines for Discover Weekly, Daily Mix, Release Radar, and real-time recommendation serving.

Content & Catalog

Music and podcast metadata pipelines, rights management data, and content ingestion from labels and distributors.

Ad Tech

Programmatic ad serving pipelines, impression tracking, measurement attribution, and advertiser analytics.

Creator Tools

Spotify for Artists analytics, streaming metrics dashboards, and audience insight pipelines for creators.

Audio Intelligence

Speech-to-text processing, content classification, podcast transcription, and audio feature extraction pipelines.

12 Example Questions with Guidance

Real question types from each round. The guidance shows what the interviewer evaluates and how to structure your answer.

SQL

Find the top 10 songs by unique listeners in the last 30 days, excluding songs with fewer than 30 seconds of play time.

Filter stream_events where play_duration >= 30. Count DISTINCT user_id per song_id. ORDER BY unique_listeners DESC LIMIT 10. Discuss why 30 seconds is the industry threshold for a 'play' and how to handle repeated plays.

SQL

Calculate the skip rate for each genre: percentage of plays where the user skipped within the first 15 seconds.

Define skip as play_duration < 15 AND user_action = 'skip'. Group by genre, compute skips / total_plays. Discuss whether autoplay skips should count differently than manual skips.

SQL

Build a user engagement score based on: days active in last 30 days, playlists created, songs saved, and podcast episodes completed.

Use conditional aggregation across multiple event types. Normalize each metric (0 to 1), then weighted average. Discuss how to handle new users with sparse data and whether to use percentile-based normalization.

SQL

Write a query to identify playlists that are losing followers faster than they are gaining them over the past 90 days, broken down by week.

Join playlist_follow_events grouped by playlist_id and week. Compare follow vs unfollow counts per week using conditional aggregation. Use a window function to track the trend across weeks. Discuss how to surface this to playlist curators via Spotify for Artists.

Python

Write a pipeline that processes raw stream events, deduplicates by event_id, enriches with track metadata, and writes daily aggregates.

Read from source, deduplicate using a set or merge key, join to track dimension, group by track_id and date, write partitioned output. Discuss idempotency and how to handle late-arriving events in the next day's partition.

Python

Implement a podcast engagement funnel: downloads to starts, starts to 25% completion, 25% to 75%, and 75% to finish.

Process podcast_events to classify each listen into funnel stages based on percent_completed. Aggregate per episode and per show. Discuss how to handle users who resume episodes across sessions and how to avoid double-counting restarts.

System Design

Design the data pipeline behind Spotify Wrapped (year-end personalized listening summary).

Year-long event aggregation from streaming events. Pre-compute per-user summaries (top artists, genres, minutes listened) incrementally. Discuss the burst of reads on launch day, caching strategy, and how to handle users who listen on multiple devices.

System Design

Design a real-time recommendation pipeline that updates playlist suggestions based on recent listening behavior.

Kafka for event ingestion, feature store for user profiles, ML model serving for recommendations. Discuss cold-start problem for new users, feedback loops (user skips recommended songs), and latency requirements for real-time updates.

System Design

Design an event-driven architecture for tracking ad impressions, clicks, and conversions across Spotify's ad platform.

Pub/Sub for event ingestion, BigQuery for warehousing, real-time aggregation for campaign dashboards. Discuss deduplication of click events, attribution windows, and how to reconcile real-time counts with batch-validated totals for billing.

Data Modeling

Model listening data to support both personalization algorithms and royalty payments to artists.

Fact: stream_events (user_id, track_id, duration, timestamp, context). Dimensions: tracks, artists, albums, playlists. Discuss the dual purpose: anonymized aggregates for ML features vs precise per-play records for financial reporting. Rights ownership can be complex (multiple writers, labels).

BigQuery

Optimize a slow BigQuery query that scans 2TB of nested event data daily. Walk through your approach to reduce cost and latency.

Start with partition pruning (date partitions), then clustering (user_id or event_type). Flatten only the nested fields you need with UNNEST instead of SELECT *. Consider materialized views for repeated aggregations. Discuss slot reservation vs on-demand pricing trade-offs.

Behavioral

Describe a time you improved a system that was already working but not scaling well.

Show proactive engineering: identified the scaling bottleneck before it caused outages. Describe the investigation, the solution, and the measured improvement. Spotify values engineers who improve systems without being asked.

What Makes Spotify Different

Spotify's data engineering culture is distinct from other large tech companies. Understanding these differences will shape how you answer every interview question.

Spotify created Backstage and Luigi

Few companies have contributed two major open source projects to the data and developer tools ecosystem. Backstage (developer portals) is now a CNCF project used by hundreds of companies. Luigi was one of the first Python-based workflow orchestrators, preceding Airflow. This engineering culture of building tools and sharing them externally is core to Spotify's identity.

The squad autonomy model

Spotify organizes into squads (small cross-functional teams), tribes (groups of related squads), chapters (skill-based communities across squads), and guilds (interest-based communities across the company). Data engineers are embedded in squads, not centralized. You own your pipelines end-to-end and make architectural decisions locally.

GCP and BigQuery, not the AWS default

While most large tech companies run on AWS, Spotify migrated fully to Google Cloud. BigQuery is the primary analytical warehouse. Apache Beam (via Dataflow and Scio) is the processing framework. This GCP-native stack means your system design answers should reference Google services, not AWS equivalents.

Event-driven everything

Every user action (play, skip, search, save, share) generates an event that flows through Kafka and Pub/Sub into processing pipelines. The event-driven architecture is not just for analytics; it powers real-time personalization, ad targeting, and content recommendations. Batch processing exists, but the event stream is the source of truth.

Common Mistakes to Avoid

Patterns that cause strong candidates to underperform in Spotify interviews.

Treating Spotify like a generic FAANG interview

Spotify's engineering culture is built on squad autonomy, not top-down mandates. Your answers should reflect independent decision-making within a collaborative team, not hierarchical escalation.

Ignoring the event-driven foundation

Nearly every Spotify system generates and consumes events. If your system design uses only batch ETL with no event layer, you are missing the core architectural pattern Spotify relies on.

Defaulting to AWS services in system design answers

Spotify runs on GCP. Use BigQuery (not Redshift), Pub/Sub (not SQS/SNS), Dataflow (not EMR), and GCS (not S3). This shows you have researched the company and can hit the ground running.

Skipping the values interview preparation

The values round is not a throwaway. Spotify has rejected strong technical candidates who could not demonstrate alignment with the band manifesto. Prepare specific stories about innovation, sincerity, and collaboration.

Not knowing what Backstage or Luigi are

Spotify created both of these widely-used open source projects. Backstage is now a CNCF project for developer portals. Luigi was an early Python workflow orchestrator. Knowing their origin shows genuine interest.

Spotify-Specific Preparation Tips

Targeted strategies to stand out in each interview round.

Event-driven architecture is Spotify's foundation

Everything at Spotify generates events: plays, skips, searches, playlist edits, ad impressions. Know event-driven patterns: event sourcing, pub/sub messaging, and how to build reliable pipelines on top of event streams. This is the most common system design context.

GCP and BigQuery are the primary platform

Spotify migrated from on-premises Hadoop to Google Cloud. BigQuery is their primary analytics warehouse. Know BigQuery-specific features: nested and repeated fields (STRUCT, ARRAY), UNNEST, partitioned tables, and materialized views. This context helps in both SQL and system design rounds.

Spotify created Backstage, now a CNCF project

Backstage is Spotify's developer portal for managing microservices, data pipelines, and documentation. Understanding Backstage shows you have researched Spotify's engineering culture and care about developer experience, which is a core value.

Autonomy within squads shapes how DEs work

Spotify organizes into autonomous squads. Data engineers are embedded in squads rather than centralized. Prepare examples of working independently within a team, making local decisions, and collaborating across team boundaries.

Spotify DE Interview FAQ

How many rounds are in a Spotify DE interview?+
Typically 5 to 6: recruiter screen, technical phone screen, and 3 to 4 onsite rounds covering SQL, system design, coding, and values. The values round is unique to Spotify and evaluates cultural alignment with the band manifesto.
Does Spotify use BigQuery SQL in interviews?+
Not always, but BigQuery-style SQL is common. Know UNNEST for array fields, STRUCT types, and partitioned table syntax. Standard SQL is always acceptable, but BigQuery familiarity gives you extra context for discussion and shows you have done your research.
What is the Spotify values interview like?+
It evaluates alignment with Spotify's band manifesto: innovation, collaboration, sincerity, and passion. Prepare stories about creative problem-solving, genuine teamwork, and caring about the end-user experience. Generic STAR answers are insufficient; they want specifics.
How does the squad model affect data engineers?+
Data engineers are embedded directly in squads rather than sitting in a centralized data team. You own your pipelines end-to-end within your squad, make local architectural decisions, and collaborate with product managers and backend engineers daily. Cross-squad alignment happens through chapters and guilds.
Can I work remotely, or do I need to relocate to Stockholm?+
Spotify offers a Work From Anywhere program with flexibility on location, though some roles are tied to specific offices (New York, London, Stockholm, and others). Compensation may vary by location. Stockholm HQ roles use a different comp structure that reflects Swedish market norms and benefits.
How long does the interview process take?+
Typically 3 to 5 weeks from recruiter screen to offer. The timeline depends on scheduling availability for the onsite loop. Spotify is generally responsive with feedback between rounds, usually within a week.
Should I focus on GCP services or is general cloud knowledge enough?+
GCP-specific knowledge gives you a real advantage. Spotify runs entirely on Google Cloud, so referencing BigQuery, Dataflow, Pub/Sub, GCS, and Bigtable in your system design answers shows you understand their stack. Generic 'cloud storage' and 'message queue' answers work, but GCP specifics stand out.
What is Backstage and why does it matter for the interview?+
Backstage is an open source developer portal that Spotify created and donated to the CNCF. It manages service catalogs, documentation, and CI/CD pipelines. Knowing about Backstage signals that you have researched Spotify's engineering contributions and care about developer experience, which is a strong cultural signal.

Prepare at Spotify Interview Difficulty

Spotify DE interviews test event-driven thinking and GCP expertise. Practice problems that mirror streaming data scenarios, BigQuery optimization, and real-time pipeline design.

Practice Spotify-Level SQL

Continue your prep

Data Engineer Interview Prep, explore the full guide

50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.

Interview Rounds

By Company

By Role

By Technology

Decisions

Question Formats