Uber Data Engineer Interview (2026)
Uber processes millions of trips and deliveries daily across hundreds of cities, generating massive volumes of real-time geospatial and transactional data. Their DE interviews test streaming architecture, geospatial reasoning, and the ability to build systems that operate at low latency under constant load. This guide covers every stage of the process, compensation by level, the tech stack you need to know, and 12 example questions with guidance.
Uber DE Interview Process
Three stages from first contact to offer. The onsite loop carries the most weight.
- 01
Recruiter Screen
Initial call covering your experience and interest in Uber. The recruiter assesses your background with real-time data systems, large-scale infrastructure, and streaming architectures. Uber operates a massive real-time platform processing millions of rides and deliveries daily, so they look for candidates comfortable with event-driven systems and low-latency requirements.
- ▸Emphasize real-time experience: streaming pipelines, Kafka, Flink, or similar tools
- ▸Uber has open-sourced many data tools (Hudi, AresDB, Cadence); mentioning familiarity shows research
- ▸Ask which team: Marketplace, Maps, Safety, or Data Platform each have different focuses
- 02
Technical Phone Screen
One to two coding problems, typically SQL or Python. Uber phone screens test data manipulation with ride and delivery event data. Expect questions about time-series analysis, geospatial logic, and event processing. The interviewer evaluates both correctness and your ability to reason about scale.
- ▸Be comfortable with geospatial concepts: latitude/longitude distance calculations, geohashing
- ▸Practice time-series SQL: sessionization, gap detection, and event ordering
- ▸Think aloud about how your solution scales to millions of events per minute
- 03
Onsite Loop
Four to five rounds covering system design, SQL deep dive, coding, data modeling, and behavioral. System design at Uber focuses on real-time architectures: surge pricing computation, ETA prediction pipelines, and marketplace matching. The data modeling round often involves designing schemas for trip data that support both real-time operations and historical analytics.
- ▸Know the CAP theorem and how it applies to Uber's real-time requirements
- ▸Uber's system design questions involve geographic partitioning and time-sensitive data
- ▸Behavioral questions focus on working under pressure and adapting to rapidly changing requirements
Interview Timeline
| Phase | Duration |
|---|---|
| Recruiter screen to phone screen | Within 1 week |
| Phone screen to onsite | Within 2 weeks |
| Onsite to offer decision | Within 1 week |
| End to end (recruiter to offer) | 3 to 5 weeks total |
Data Engineer Compensation by Level
Total compensation ranges for Uber DE roles. Uber grants RSUs that vest over 4 years on a standard 25% annual schedule. Equity values below are annualized. Actual equity value depends on stock price at each vesting date.
IC3 (L3)
$150K to $200K total comp. Base: $130K to $155K. Equity/yr: $15K to $30K. Bonus: $10K to $15K. 0 to 2 years experience.
IC4 (L4)
$200K to $300K total comp. Base: $155K to $190K. Equity/yr: $30K to $75K. Bonus: $15K to $35K. 2 to 5 years experience.
IC5 (L5 Senior)
$300K to $420K total comp. Base: $190K to $230K. Equity/yr: $75K to $140K. Bonus: $35K to $50K. 5 to 8 years experience.
IC6 (L6 Staff)
$400K to $550K total comp. Base: $230K to $275K. Equity/yr: $130K to $210K. Bonus: $40K to $65K. 8+ years experience.
Problems sourced from real Uber interview reports. Run your code in the browser.
Uber DE Tech Stack
The tools and infrastructure Uber data engineers work with daily. Knowing these shows interviewers you understand the environment and can contribute from day one.
Languages
Python, Java, Scala, Go. Python dominates for data pipelines and scripting. Java and Scala power Flink and Spark jobs. Go is used in backend microservices that DEs interact with for data ingestion.
Streaming
Apache Kafka, Apache Flink, Apache Spark Streaming. Kafka is the central nervous system for all event data at Uber. Flink handles real-time stream processing for surge pricing, matching, and fraud detection.
Storage
Apache Hudi, Parquet on S3, Presto/Trino. Hudi (created at Uber) enables incremental processing on the data lake. Raw data lands in Parquet format on S3. Presto and Trino serve as the interactive query engines.
Orchestration
Cadence (Uber open-source), Apache Airflow. Cadence is Uber's workflow orchestration engine, designed for durable and fault-tolerant workflows at massive scale. Airflow handles traditional DAG-based batch scheduling.
Compute
On-prem and cloud hybrid infrastructure. Uber runs a significant portion of compute on its own data centers, with cloud bursting for peak loads. DEs must understand both bare-metal performance tuning and cloud-native autoscaling.
Geospatial
H3 hexagonal indexing system (Uber open-source). H3 divides the world into hexagonal cells at multiple resolutions. Uber uses H3 to partition geographic data, compute supply/demand by zone, and power location-based features.
DE Teams at Uber
Uber has data engineers across every major product area. Each team has distinct data challenges and interview focus areas. Ask your recruiter which team you are interviewing for so you can tailor your preparation.
Marketplace
Pricing, surge, driver matching. DEs build pipelines for real-time supply/demand signals, surge multiplier computation, and matching algorithm feature stores. This team generates the most system design interview questions.
Maps and Geospatial
Routing, ETA prediction, map data quality. DEs process billions of GPS pings daily, maintain geospatial indexes (H3), and feed ML models for arrival time estimation. Expect heavy geospatial SQL if interviewing here.
Safety and Insurance
Incident detection, fraud signals, insurance risk scoring. DEs build event pipelines that flag anomalous trip patterns and feed real-time safety interventions. Data quality is critical because false negatives have real consequences.
Eats and Delivery
Restaurant analytics, delivery time prediction, courier optimization. DEs manage order event streams and build pipelines that balance delivery speed against courier utilization.
Freight
Logistics, load matching, carrier analytics. DEs build pipelines for shipment tracking, carrier performance scoring, and pricing models across long-haul routes.
Data Platform
Internal tooling, governance, infrastructure. DEs build and maintain the shared data lake, schema registry, data catalog, and self-serve query tools used by every other team.
12 Example Questions with Guidance
Real question types from each round, covering SQL, Python, system design, data modeling, and behavioral.
Calculate the average wait time between ride request and driver acceptance, segmented by city and hour of day.
Join ride_requests to ride_acceptances on ride_id. Compute wait_time = acceptance_ts minus request_ts. AVG grouped by city and EXTRACT(HOUR FROM request_ts). Discuss handling rides that were never accepted and how NULL acceptance times affect the average.
Find drivers who completed more than 20 trips in a single day but had an average rating below 4.0 on those trips.
Aggregate trips by driver and date. HAVING COUNT >= 20 AND AVG(rating) < 4.0. Discuss whether to include trips with no rating, and what this pattern might indicate about driver fatigue or service quality degradation.
Identify surge pricing periods: find continuous time windows where the surge multiplier exceeded 2.0 for more than 15 minutes in a given zone.
Use the islands-and-gaps technique: ROW_NUMBER minus a sequence to group consecutive rows, then filter groups spanning more than 15 minutes. Discuss event granularity and how to handle missing data points in the stream.
For each city, find the top 3 hours with the highest ratio of ride requests to available drivers in the past 30 days.
Aggregate requests and available drivers by city and hour. Compute the ratio, then use ROW_NUMBER() OVER (PARTITION BY city ORDER BY ratio DESC) to rank. Filter to rank <= 3. Discuss what 'available' means operationally.
Write a function that takes a stream of GPS coordinates and detects when a driver has been stationary for more than 5 minutes.
Track last-moved timestamp. If distance between consecutive points is below threshold (e.g. 50 meters) and elapsed time exceeds 5 minutes, flag as stationary. Discuss Haversine distance calculation, GPS drift noise, and how to handle signal gaps.
Implement a sliding window function that computes the rolling average ETA error for a driver over their last 10 completed trips.
Maintain a deque of size 10. For each completed trip, push actual_duration minus predicted_duration, pop oldest if full. Return mean of deque. Discuss what ETA error distribution reveals about model accuracy.
Design a real-time surge pricing computation pipeline.
Ingest ride requests and driver availability via Kafka. Flink computes supply/demand ratio per geospatial zone in tumbling windows. Serve from a low-latency key-value store. Discuss zone granularity (H3 hexagons), smoothing to avoid price oscillation, and fallback when the streaming layer lags.
Design Uber's trip data pipeline that serves both real-time operations and historical analytics.
Kappa architecture: Kafka for real-time, Flink for stream processing, Hudi for incremental updates to the data lake. Discuss exactly-once semantics, late-arriving events from mobile clients, and partition strategy (by city and date).
Design a driver matching system that pairs riders with nearby available drivers in under 2 seconds.
Drivers publish location to Kafka. Flink maintains a geospatial index (H3) of available drivers in state. On ride request, query the index for drivers in adjacent hexagons, rank by ETA and rating, and dispatch. Discuss rebalancing when supply is sparse.
Model trip data to support marketplace analytics: supply/demand balance, driver utilization, and rider conversion funnels.
Fact: trips (request_ts, accept_ts, pickup_ts, dropoff_ts, fare, surge_multiplier, zone_id). Dimensions: zones (with H3 hierarchy), drivers, riders. Discuss pre-aggregating zone-level metrics hourly.
Design a schema for Uber Eats that tracks order lifecycle from placement through delivery, supporting both operational dashboards and weekly business reviews.
Fact: orders with timestamps for each state transition (placed, confirmed, preparing, picked_up, delivered). Dimensions: restaurants, couriers, customers, cities. Discuss SCD Type 2 for restaurant attributes that change.
Tell me about a time you had to make a technical decision quickly under production pressure.
Uber operates 24/7 with real-time financial impact. Describe the incident, the options you considered, what you chose and why, and the outcome. Show you can balance speed with safety and communicate clearly during high-stress situations.
What Makes Uber Different
How Uber's data engineering culture and infrastructure set it apart from other top companies.
Hybrid infrastructure
Unlike companies that run entirely on AWS or GCP, Uber operates a hybrid of on-prem data centers and cloud resources. This means DEs must understand bare-metal performance tuning alongside cloud-native patterns. Interview questions often probe whether you can reason about infrastructure you manage directly, not just managed services.
Open-source DNA
Uber has built and open-sourced multiple foundational data tools: Apache Hudi for incremental data lake management, Cadence for workflow orchestration, H3 for geospatial indexing, and AresDB for real-time analytics. Interviewers expect candidates to know these exist and understand the problems they solve.
Multi-sided marketplace complexity
Every Uber transaction involves at least two parties (rider and driver, eater and courier) plus the platform. This creates data modeling challenges that single-sided businesses do not have. Supply/demand balancing, dynamic pricing, and matching algorithms all generate complex event streams that DEs must process and serve.
Real-time financial impact
When a data pipeline breaks at Uber, drivers earn less, riders wait longer, and the company loses revenue every minute. This urgency shapes interview expectations. Uber wants DEs who think about monitoring, alerting, SLAs, and graceful degradation as first-class requirements, not afterthoughts.
Uber-Specific Preparation Tips
Four areas that separate prepared candidates from everyone else.
Real-time is the default, not the exception
Most Uber DE questions are framed around real-time or near-real-time requirements. Batch processing is secondary. Know Kafka, Flink, and streaming concepts: watermarks, windowing, exactly-once delivery, and backpressure.
Geospatial data is core to Uber's business
Uber partitions data geographically using H3 hexagonal indexing. Understand geohashing, spatial joins, and how to partition and query location-based data efficiently. This comes up in both system design and data modeling rounds.
Know Uber's open-source contributions
Uber created Apache Hudi (incremental data processing), AresDB (real-time analytics), and Cadence (workflow orchestration). Mentioning these tools and understanding their purpose shows deep familiarity with Uber's data ecosystem.
Scale is measured in events per second
Uber processes millions of events per second across rides, deliveries, and driver locations. When discussing system design, think in terms of throughput (events/sec), latency (p99 in milliseconds), and geographic distribution across hundreds of cities.
Common Mistakes in Uber DE Interviews
Patterns that cost candidates offers. These are specific to Uber and come from the unique characteristics of their data infrastructure.
Defaulting to batch when the question requires real-time
Candidates propose nightly Spark jobs for problems that demand sub-second latency. At Uber, surge pricing, driver matching, and ETA updates all require streaming. If the interviewer describes a real-time scenario, your first instinct should be Kafka plus Flink, not Airflow plus Spark.
Ignoring geographic partitioning
Uber data is inherently spatial. Candidates who partition only by date miss the point. Most Uber tables are partitioned by city or H3 hex zone first, then by time. Forgetting this leads to full table scans and shows you have not thought about how Uber's data is actually structured.
Treating all events as if they arrive in order
Mobile clients send events over unreliable networks. GPS pings arrive late. Trip end events sometimes arrive before trip start events. Candidates who assume ordered data get caught when the interviewer asks about late arrivals. Always discuss watermarks, event-time processing, and how to handle out-of-order data.
Designing systems without considering city-level isolation
Uber operates in hundreds of cities with different regulations, currencies, and demand patterns. A system designed as a single global pipeline will not work. Interviewers expect you to discuss per-city or per-region isolation, failover, and how to prevent a problem in one city from affecting another.
Skipping the cost and operational complexity discussion
Uber runs a hybrid on-prem and cloud infrastructure. Candidates who propose expensive fully-managed cloud services without discussing cost tradeoffs miss the mark. Mention compute costs, storage tiering, and how to handle peak vs off-peak workloads efficiently.
Uber DE Interview FAQ
How many rounds are in an Uber DE interview?+
Does Uber test Kafka and Flink in DE interviews?+
What programming languages does Uber DE use?+
How does Uber's interview compare to other ride-sharing companies?+
What is the typical offer timeline after the onsite?+
Does Uber require system design for IC3 and IC4 candidates?+
How does Uber's equity work?+
Can I negotiate the Uber offer?+
Prepare at Uber Interview Difficulty
Uber DE questions emphasize real-time systems, geospatial data, and streaming architecture. Practice with problems that test the same patterns Uber interviewers use.