Uber Data Engineer Interview

Uber processes millions of trips and deliveries daily across hundreds of cities, generating massive volumes of real-time geospatial and transactional data. Their DE interviews test streaming architecture, geospatial reasoning, and the ability to build systems that operate at low latency under constant load. This guide covers every stage of the process, compensation by level, the tech stack you need to know, and 12 example questions with guidance.

Uber

Transportation · San Francisco, US · UBER

live data · June 11, 2026

DE total comp

$186K median

$157K–$214K · 5 verified reports

Hiring now

No open DE roles

tracked daily

Team happiness

51 / 100 · Neutral

model score from employee signals

Layoff risk (30d)

Moderate

Employee sentiment

3.7 / 5

Mixed

Employees

5,001–50,000

Uber DE Interview Process

Three stages from first contact to offer. The onsite loop carries the most weight.

01
Recruiter Screen
Initial call covering your experience and interest in Uber. The recruiter assesses your background with real-time data systems, large-scale infrastructure, and streaming architectures. Uber operates a massive real-time platform processing millions of rides and deliveries daily, so they look for candidates comfortable with event-driven systems and low-latency requirements.
- ▸Emphasize real-time experience: streaming pipelines, Kafka, Flink, or similar tools
- ▸Uber has open-sourced many data tools (Hudi, AresDB, Cadence); mentioning familiarity shows research
- ▸Ask which team: Marketplace, Maps, Safety, or Data Platform each have different focuses
02
Technical Phone Screen
One to two coding problems, typically SQL or Python. Uber phone screens test data manipulation with ride and delivery event data. Expect questions about time-series analysis, geospatial logic, and event processing. The interviewer evaluates both correctness and your ability to reason about scale.
- ▸Be comfortable with geospatial concepts: latitude/longitude distance calculations, geohashing
- ▸Practice time-series SQL: sessionization, gap detection, and event ordering
- ▸Think aloud about how your solution scales to millions of events per minute
03
Onsite Loop
Four to five rounds covering system design, SQL deep dive, coding, data modeling, and behavioral. System design at Uber focuses on real-time architectures: surge pricing computation, ETA prediction pipelines, and marketplace matching. The data modeling round often involves designing schemas for trip data that support both real-time operations and historical analytics.
- ▸Know the CAP theorem and how it applies to Uber's real-time requirements
- ▸Uber's system design questions involve geographic partitioning and time-sensitive data
- ▸Behavioral questions focus on working under pressure and adapting to rapidly changing requirements

Interview Timeline

Phase	Duration
Recruiter screen to phone screen	Within 1 week
Phone screen to onsite	Within 2 weeks
Onsite to offer decision	Within 1 week
End to end (recruiter to offer)	3 to 5 weeks total

Uber data engineer compensation

Median and range from verified salary reports, by level.

Level	Base	Total comp
JuniorL3	$130K–$160K	$170K–$220K
Mid-levelL4	$134K median	$186K median · $157K–$214K · 5 reports
SeniorL5	$212K median	$368K median · $276K–$441K · 4 reports
StaffL6	$235K–$295K	$500K–$720K
PrincipalL7	$280K–$360K

Uber DE Tech Stack

The tools and infrastructure Uber data engineers work with daily. Knowing these shows interviewers you understand the environment and can contribute from day one.

Languages

Python, Java, Scala, Go. Python dominates for data pipelines and scripting. Java and Scala power Flink and Spark jobs. Go is used in backend microservices that DEs interact with for data ingestion.

Streaming

Apache Kafka, Apache Flink, Apache Spark Streaming. Kafka is the central nervous system for all event data at Uber. Flink handles real-time stream processing for surge pricing, matching, and fraud detection.

Storage

Apache Hudi, Parquet on S3, Presto/Trino. Hudi (created at Uber) enables incremental processing on the data lake. Raw data lands in Parquet format on S3. Presto and Trino serve as the interactive query engines.

Orchestration

Cadence (Uber open-source), Apache Airflow. Cadence is Uber's workflow orchestration engine, designed for durable and fault-tolerant workflows at massive scale. Airflow handles traditional DAG-based batch scheduling.

Compute

On-prem and cloud hybrid infrastructure. Uber runs a significant portion of compute on its own data centers, with cloud bursting for peak loads. DEs must understand both bare-metal performance tuning and cloud-native autoscaling.

Geospatial

H3 hexagonal indexing system (Uber open-source). H3 divides the world into hexagonal cells at multiple resolutions. Uber uses H3 to partition geographic data, compute supply/demand by zone, and power location-based features.

DE Teams at Uber

Uber has data engineers across every major product area. Each team has distinct data challenges and interview focus areas. Ask your recruiter which team you are interviewing for so you can tailor your preparation.

Marketplace

Pricing, surge, driver matching. DEs build pipelines for real-time supply/demand signals, surge multiplier computation, and matching algorithm feature stores. This team generates the most system design interview questions.

Maps and Geospatial

Routing, ETA prediction, map data quality. DEs process billions of GPS pings daily, maintain geospatial indexes (H3), and feed ML models for arrival time estimation. Expect heavy geospatial SQL if interviewing here.

Safety and Insurance

Incident detection, fraud signals, insurance risk scoring. DEs build event pipelines that flag anomalous trip patterns and feed real-time safety interventions. Data quality is critical because false negatives have real consequences.

Eats and Delivery

Restaurant analytics, delivery time prediction, courier optimization. DEs manage order event streams and build pipelines that balance delivery speed against courier utilization.

Freight

Logistics, load matching, carrier analytics. DEs build pipelines for shipment tracking, carrier performance scoring, and pricing models across long-haul routes.

Data Platform

Internal tooling, governance, infrastructure. DEs build and maintain the shared data lake, schema registry, data catalog, and self-serve query tools used by every other team.

Real Uber interview questions

Reported questions from this company's loops, tagged by domain, round, and level.

Pipeline Architectureonsite pipeline architecture· L72025

Design a cost-efficient analytics architecture to ingest, store, and query 600 million daily Kafka clickstream events with a two-year retention period

Architect an end-to-end pipeline: Kafka consumers for ingestion, partitioned columnar storage (Parquet/ORC on S3/GCS), tiered storage strategy (hot/warm/cold) for cost efficiency, query engine selection (Presto/Trino/Athena) for ad-hoc analytics. Must handle 600M events/day with 2-year retention while keeping storage costs manageable.

Data Modelingonsite data modeling· L62025

Design a relational database schema to record rides between riders and drivers, including table structures and how they join together

Design core tables (riders, drivers, vehicles, trips, payments) with well-defined foreign keys. Explain one-to-many relationships (driver to trips, rider to trips), how vehicle assignment works, and how the schema supports both real-time operational queries and historical analytics. Discuss indexing strategy for high-throughput queries.

Pythononsite python· L52025

Given a list of meeting time intervals, find the minimum number of rooms required so no two overlapping meetings share a room

Write a function min_rooms(meetings) where each meeting is a tuple (start, end) with start < end. Intervals are half-open: a meeting (0, 30) occupies times [0, 30). A meeting ending at time 10 does NOT conflict with one starting at time 10. Return the minimum number of rooms needed so no two overlapping meetings share a room. Example: meetings = [(0, 30), (5, 10), (15, 20)] Output: 2 (meetings (0,30) and (5,10) overlap) meetings = [(7, 10), (2, 4)] Output: 1 meetings = [(1, 5), (2, 6), (3, 7)] Output: 3 (all three overlap at time 3) meetings = [(0, 5), (5, 10)] Output: 1…

SQLphone screen sql· L42023

Uber DE phone screen: given 5 related tables, write SQL to aggregate across them; questions are wordy but not tricky; window functions needed for some aggregations

mixedphone screen sql· unknown2025

Uber Data Engineer - SDE 2 role

I took the interview in Jan 2024; so i know this is late and I apologize.\n\nRound 1 was just a screening round; asked about compensation, location\n\nRound 2) Techinical Screen -> very simiklar to https://leetcode.com/problems/minimum-path-sum/description/\n\nFinal Round)\n\nCoding Round #1: It was a combination of https://leetcode.com/problems/merge-intervals/description/ and binary search. I did not do well in this; partly because I just did not understand the interviewer\'s accent. The interviewer also gave no hints, but I guess thats just the market and I need to improve…

Pipeline Architectureonsite pipeline architecture· L52025

Design an end-to-end data pipeline that ingests daily raw files from multiple sources and prepares clean, reliable datasets for predicting city-wide bicycle rental demand.

The problem tests end-to-end pipeline design including: source ingestion (daily raw CSV/JSON files from multiple city providers), data quality checks, normalization, feature engineering for ML model (weather, time of day, location features), output format optimized for a prediction model. Expected to discuss orchestration (Airflow/Dagster), storage layers (raw, cleaned, feature), monitoring, and backfill strategy. Interviewer follows up on handling missing source files and schema drift across providers.

Data Modelingonsite data modeling· L62025

Design a relational database schema recording rides between riders and drivers, including entities for riders, drivers, vehicles, and trips with appropriate foreign key relationships.

Data modeling design question from Uber Data Engineer onsite. Candidate must define entities: Riders (rider_id, name, email, signup_date), Drivers (driver_id, name, license_number, rating), Vehicles (vehicle_id, driver_id FK, make, model, year, license_plate), Trips (trip_id, rider_id FK, driver_id FK, vehicle_id FK, pickup_location, dropoff_location, start_time, end_time, fare, status). Key relationships: driver 1:M vehicles, rider 1:M trips, driver 1:M trips. Discussion of separating fact tables (trips, payments) from dimension tables, stable linking keys, and preventing metric duplication…

Pythononsite python· L52025

Given a list of named events with start and end times, find all pairs of events that overlap

Write a function find_overlaps(events) where each event is a tuple (name, start, end). Return a list of tuples containing the names of all pairs of events that overlap in time. Two events overlap if one starts strictly before the other ends and vice versa. Events that share only a boundary point (one ends exactly when another starts) do NOT overlap. Example: events = [('A', 1, 5), ('B', 3, 7), ('C', 6, 9), ('D', 8, 10)] Output: [('A', 'B'), ('B', 'C'), ('C', 'D')] events = [('X', 1, 2), ('Y', 3, 4)] Output: [] events = [('P', 1, 10), ('Q', 2, 3), ('R', 4, 5)] Output: [('P',…

SQLonsite sql· L52025

Write a SQL query to randomly select a driver using weighted probabilities: given a table with a weighting column, each driver's selection probability should be proportional to their weight.

Schema: drivers(driver_id, driver_name, weight). The weighted random selection requires computing a cumulative weight sum using a window function, then comparing a random number (drawn uniformly between 0 and total_weight) to the cumulative boundaries. Approach: SUM(weight) OVER (ORDER BY driver_id ROWS UNBOUNDED PRECEDING) to get cumulative sums, then filter for the row where the random value falls within the bucket. Tests: window functions, RANDOM(), CTE usage. Used to improve Uber rider-driver matching systems.

Pipeline Architectureonsite pipeline architecture· L52024

Design the backend of a near real-time dashboard showing trending dishes in a city

Uber Data Engineer SDE2 final round, Jan 2024. System design question: design backend for a near real-time dashboard showing which dishes are trending in a given city. Expected to discuss data ingestion pipeline, aggregation strategy, storage layer, and serving layer for near-real-time updates. Candidate felt they did well. Part of a final loop including coding rounds (merge intervals + binary search, course schedule graph problems) and behavioral. Candidate rejected overall.

What Makes Uber Different

How Uber's data engineering culture and infrastructure set it apart from other top companies.

Hybrid infrastructure

Unlike companies that run entirely on AWS or GCP, Uber operates a hybrid of on-prem data centers and cloud resources. This means DEs must understand bare-metal performance tuning alongside cloud-native patterns. Interview questions often probe whether you can reason about infrastructure you manage directly, not just managed services.

Open-source DNA

Uber has built and open-sourced multiple foundational data tools: Apache Hudi for incremental data lake management, Cadence for workflow orchestration, H3 for geospatial indexing, and AresDB for real-time analytics. Interviewers expect candidates to know these exist and understand the problems they solve.

Multi-sided marketplace complexity

Every Uber transaction involves at least two parties (rider and driver, eater and courier) plus the platform. This creates data modeling challenges that single-sided businesses do not have. Supply/demand balancing, dynamic pricing, and matching algorithms all generate complex event streams that DEs must process and serve.

Real-time financial impact

When a data pipeline breaks at Uber, drivers earn less, riders wait longer, and the company loses revenue every minute. This urgency shapes interview expectations. Uber wants DEs who think about monitoring, alerting, SLAs, and graceful degradation as first-class requirements, not afterthoughts.

Uber-Specific Preparation Tips

Four areas that separate prepared candidates from everyone else.

Real-time is the default, not the exception

Most Uber DE questions are framed around real-time or near-real-time requirements. Batch processing is secondary. Know Kafka, Flink, and streaming concepts: watermarks, windowing, exactly-once delivery, and backpressure.

Geospatial data is core to Uber's business

Uber partitions data geographically using H3 hexagonal indexing. Understand geohashing, spatial joins, and how to partition and query location-based data efficiently. This comes up in both system design and data modeling rounds.

Know Uber's open-source contributions

Uber created Apache Hudi (incremental data processing), AresDB (real-time analytics), and Cadence (workflow orchestration). Mentioning these tools and understanding their purpose shows deep familiarity with Uber's data ecosystem.

Scale is measured in events per second

Uber processes millions of events per second across rides, deliveries, and driver locations. When discussing system design, think in terms of throughput (events/sec), latency (p99 in milliseconds), and geographic distribution across hundreds of cities.

Common Mistakes in Uber DE Interviews

Patterns that cost candidates offers. These are specific to Uber and come from the unique characteristics of their data infrastructure.

Defaulting to batch when the question requires real-time

Candidates propose nightly Spark jobs for problems that demand sub-second latency. At Uber, surge pricing, driver matching, and ETA updates all require streaming. If the interviewer describes a real-time scenario, your first instinct should be Kafka plus Flink, not Airflow plus Spark.

Ignoring geographic partitioning

Uber data is inherently spatial. Candidates who partition only by date miss the point. Most Uber tables are partitioned by city or H3 hex zone first, then by time. Forgetting this leads to full table scans and shows you have not thought about how Uber's data is actually structured.

Treating all events as if they arrive in order

Mobile clients send events over unreliable networks. GPS pings arrive late. Trip end events sometimes arrive before trip start events. Candidates who assume ordered data get caught when the interviewer asks about late arrivals. Always discuss watermarks, event-time processing, and how to handle out-of-order data.

Designing systems without considering city-level isolation

Uber operates in hundreds of cities with different regulations, currencies, and demand patterns. A system designed as a single global pipeline will not work. Interviewers expect you to discuss per-city or per-region isolation, failover, and how to prevent a problem in one city from affecting another.

Skipping the cost and operational complexity discussion

Uber runs a hybrid on-prem and cloud infrastructure. Candidates who propose expensive fully-managed cloud services without discussing cost tradeoffs miss the mark. Mention compute costs, storage tiering, and how to handle peak vs off-peak workloads efficiently.

Uber practice set

Problems on the platform tagged and predicted for Uber loops, from live listings and interview reports.

SQLmedium~10 min

Binary Flag Indicators

The feature flag dashboard needs a clean boolean representation for downstream consumers. For each flag, show the flag name, a 1/0 indicator for whether it is enabled, and a 1/0 indicator for whether it is disabled.

Pythoneasy~5 min

Type Caster

Given a list of values, return a new list where each element is the result of int(value). Any element that raises when cast becomes None instead. Preserve input order.

Data Modelingeasy~20 min

Event Ticketing System Data Model

We run an IT helpdesk platform. Users submit support tickets, which are assigned to agents. Tickets go through multiple status changes before being resolved. SLA compliance is critical: P1 tickets must be resolved within 4 hours, P2 within 24 hours. Design the schema, and describe how you would load data from a JSON API feed into it.

Pipeline Architecturemedium~25 min

The Queue That Wouldn't Stop Growing

Your streaming video event pipeline shows consumer lag spiking from near-zero to over 500,000 messages within two hours. You need to diagnose whether the cause is a producer burst or a consumer slowdown, then design a monitoring and auto-remediation system that can detect, alert on, and automatically recover from future lag events.

Pythonmedium~20 min

The Coin Vault

Given a target amount and a list of coin denominations, return the minimum coins needed using a greedy strategy: repeatedly take the largest coin that does not exceed the remaining amount. Return -1 if the greedy approach cannot make exact change.

Data Modelingmedium~25 min

Housing Marketplace Analytics

We run a housing marketplace. Sellers list properties, buyers view listings and submit leads. We need to measure conversion rate from view to lead by location and property type. Design the data model.

Uber DE Interview FAQ

How many rounds are in an Uber DE interview?+

Typically 6 to 7: recruiter screen, technical phone screen, and 4 to 5 onsite rounds covering SQL, system design, coding, data modeling, and behavioral. Some teams add a domain-specific round for marketplace or maps.

Does Uber test Kafka and Flink in DE interviews?+

Not always directly, but streaming concepts are central. You should understand event-time vs processing-time, windowing strategies, watermarks, and exactly-once semantics. Uber uses Kafka and Flink heavily, so referencing them in system design is appropriate and expected.

What programming languages does Uber DE use?+

Python, Java, Scala, and Go are common. For interviews, Python and SQL are accepted. If you have Java/Scala experience with Spark or Flink, it can be an advantage for streaming-focused roles.

How does Uber's interview compare to other ride-sharing companies?+

Uber's interview is more infrastructure-focused than Lyft's, with heavier emphasis on real-time systems, geospatial data, and large-scale distributed processing. The behavioral round focuses on operating under pressure in a fast-moving environment.

What is the typical offer timeline after the onsite?+

Most candidates hear back within 5 to 7 business days after the onsite. If you are at IC5 or above, the calibration process may add a few extra days. Recruiters are usually responsive about status updates if you ask.

Does Uber require system design for IC3 and IC4 candidates?+

IC3 candidates typically get a lighter system design round focused on data modeling and basic architecture. IC4 candidates face a full system design round. IC5 and IC6 candidates get the most rigorous design questions with expectations for cross-team and org-level thinking.

How does Uber's equity work?+

Uber grants RSUs (Restricted Stock Units) that vest over 4 years. The standard schedule is 25% per year. RSU value fluctuates with Uber's stock price, which means your actual total compensation depends on market conditions at each vesting date.

Can I negotiate the Uber offer?+

Yes. Uber recruiters expect negotiation, especially at IC5 and above. Competing offers from other top-tier companies (Meta, Google, Netflix) give the most leverage. Base salary has less room to move than equity and sign-on bonus. Always negotiate equity if you believe in the stock trajectory.

02 / Why practice

Prepare at Uber Interview Difficulty

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Parsing and reshaping, sessionization, dedup with tie-breaks, streaming aggregation, top-N-per-group. Writing them by hand turns the unfamiliar into pattern recognition

Practice Uber-Level Problems

Related Guides

DE Interview Prep Guide→

Complete preparation framework for data engineering interviews

System Design for DE→

Pipeline architecture, batch vs streaming, and scale reasoning

SQL Interview Questions→

Every SQL topic tested in DE interviews with frequency data