Uber processes millions of trips and deliveries daily across hundreds of cities, generating massive volumes of real-time geospatial and transactional data. Their DE interviews test streaming architecture, geospatial reasoning, and the ability to build systems that operate at low latency under constant load. This guide covers every stage of the process, compensation by level, the tech stack you need to know, and 12 example questions with detailed guidance.
Three stages from first contact to offer. The onsite loop carries the most weight.
Initial call covering your experience and interest in Uber. The recruiter assesses your background with real-time data systems, large-scale infrastructure, and streaming architectures. Uber operates a massive real-time platform processing millions of rides and deliveries daily, so they look for candidates comfortable with event-driven systems and low-latency requirements.
One to two coding problems, typically SQL or Python. Uber phone screens test data manipulation with ride and delivery event data. Expect questions about time-series analysis, geospatial logic, and event processing. The interviewer evaluates both correctness and your ability to reason about scale.
Four to five rounds covering system design, SQL deep dive, coding, data modeling, and behavioral. System design at Uber focuses on real-time architectures: surge pricing computation, ETA prediction pipelines, and marketplace matching. The data modeling round often involves designing schemas for trip data that support both real-time operations and historical analytics.
Typical pacing from recruiter screen to offer letter. Uber moves faster than most big tech companies, especially for senior roles where they compete with Meta and Google for candidates.
Recruiter screen to phone screen
Within 1 week
Phone screen to onsite
Within 2 weeks
Onsite to offer decision
Within 1 week
End to end (recruiter to offer)
3 to 5 weeks total
Total compensation ranges for Uber DE roles. Uber grants RSUs that vest over 4 years on a standard 25% annual schedule. Equity values below are annualized. Actual equity value depends on stock price at each vesting date.
Base
$130K to $155K
Equity/yr
$15K to $30K/yr
Bonus
$10K to $15K
Total Comp
$150K to $200K
Base
$155K to $190K
Equity/yr
$30K to $75K/yr
Bonus
$15K to $35K
Total Comp
$200K to $300K
Base
$190K to $230K
Equity/yr
$75K to $140K/yr
Bonus
$35K to $50K
Total Comp
$300K to $420K
Base
$230K to $275K
Equity/yr
$130K to $210K/yr
Bonus
$40K to $65K
Total Comp
$400K to $550K
Ranges based on reported data from levels.fyi and Glassdoor, 2025 to 2026. Actual offers vary by location, team, and negotiation.
What Uber expects at each level for data engineering roles. The interview difficulty and scope of design questions scale directly with the target level.
Builds assigned pipelines under guidance. Writes clean SQL and Python. Follows established patterns and coding standards. Delivers tasks within a sprint. Interview focuses on SQL proficiency and basic data modeling.
Owns pipelines end-to-end: design, build, test, deploy, monitor. Handles on-call for owned systems. Makes technical decisions within their domain. Expected to identify and fix data quality issues proactively.
Designs systems that span multiple teams. Drives technical decisions for complex projects. Mentors junior engineers. Defines data contracts and SLAs. Interview includes rigorous system design with expectations for cross-team thinking.
Sets technical direction for the org. Leads multi-quarter initiatives. Influences architecture across Uber's data platform. Represents DE in cross-functional strategy. Interview evaluates organizational impact and technical vision.
The tools and infrastructure Uber data engineers work with daily. Knowing these shows interviewers you understand the environment and can contribute from day one.
Python dominates for data pipelines and scripting. Java and Scala power Flink and Spark jobs. Go is used in backend microservices that DEs interact with for data ingestion.
Kafka is the central nervous system for all event data at Uber. Flink handles real-time stream processing for surge pricing, matching, and fraud detection. Spark Streaming is used for heavier batch-streaming hybrid workloads.
Hudi (created at Uber) enables incremental processing on the data lake. Raw data lands in Parquet format on S3. Presto and Trino serve as the interactive query engines for analysts and data scientists.
Cadence is Uber's workflow orchestration engine, designed for durable and fault-tolerant workflows at massive scale. Airflow handles traditional DAG-based batch scheduling.
Uber runs a significant portion of compute on its own data centers, with cloud bursting for peak loads. DEs must understand both bare-metal performance tuning and cloud-native autoscaling.
H3 divides the world into hexagonal cells at multiple resolutions. Uber uses H3 to partition geographic data, compute supply/demand by zone, and power location-based features across all products.
Uber has data engineers across every major product area. Each team has distinct data challenges and interview focus areas. Ask your recruiter which team you are interviewing for so you can tailor your preparation.
Pricing, surge, driver matching. DEs build pipelines for real-time supply/demand signals, surge multiplier computation, and matching algorithm feature stores. This team generates the most system design interview questions.
Routing, ETA prediction, map data quality. DEs process billions of GPS pings daily, maintain geospatial indexes (H3), and feed ML models for arrival time estimation. Expect heavy geospatial SQL if interviewing here.
Incident detection, fraud signals, insurance risk scoring. DEs build event pipelines that flag anomalous trip patterns and feed real-time safety interventions. Data quality is critical because false negatives have real consequences.
Restaurant analytics, delivery time prediction, courier optimization. DEs manage order event streams and build pipelines that balance delivery speed against courier utilization. Multi-sided marketplace data (eater, restaurant, courier) creates unique modeling challenges.
Logistics, load matching, carrier analytics. DEs build pipelines for shipment tracking, carrier performance scoring, and pricing models across long-haul routes. The data is sparser but the individual transactions are much higher value.
Internal tooling, governance, infrastructure. DEs build and maintain the shared data lake, schema registry, data catalog, and self-serve query tools used by every other team. If you want to work on systems that scale across all of Uber, this is the team.
How Uber's data engineering culture and infrastructure set it apart from other top companies. Understanding these differences helps you frame your answers during the interview.
Unlike companies that run entirely on AWS or GCP, Uber operates a hybrid of on-prem data centers and cloud resources. This means DEs must understand bare-metal performance tuning alongside cloud-native patterns. Interview questions often probe whether you can reason about infrastructure you manage directly, not just managed services.
Uber has built and open-sourced multiple foundational data tools: Apache Hudi for incremental data lake management, Cadence for workflow orchestration, H3 for geospatial indexing, and AresDB for real-time analytics. Interviewers expect candidates to know these exist and understand the problems they solve.
Every Uber transaction involves at least two parties (rider and driver, eater and courier) plus the platform. This creates data modeling challenges that single-sided businesses do not have. Supply/demand balancing, dynamic pricing, and matching algorithms all generate complex event streams that DEs must process and serve.
When a data pipeline breaks at Uber, drivers earn less, riders wait longer, and the company loses revenue every minute. This urgency shapes interview expectations. Uber wants DEs who think about monitoring, alerting, SLAs, and graceful degradation as first-class requirements, not afterthoughts.
Real question types from each round, covering SQL, Python, system design, data modeling, and behavioral. The guidance shows what the interviewer looks for and how to structure your answer.
Join ride_requests to ride_acceptances on ride_id. Compute wait_time = acceptance_ts minus request_ts. AVG grouped by city and EXTRACT(HOUR FROM request_ts). Discuss handling rides that were never accepted and how NULL acceptance times affect the average.
Aggregate trips by driver and date. HAVING COUNT >= 20 AND AVG(rating) < 4.0. Discuss whether to include trips with no rating, and what this pattern might indicate about driver fatigue or service quality degradation.
Use the islands-and-gaps technique: ROW_NUMBER minus a sequence to group consecutive rows, then filter groups spanning more than 15 minutes. Discuss event granularity and how to handle missing data points in the stream.
Aggregate requests and available drivers by city and hour. Compute the ratio, then use ROW_NUMBER() OVER (PARTITION BY city ORDER BY ratio DESC) to rank. Filter to rank <= 3. Discuss what 'available' means operationally and how to handle hours with zero drivers.
Track last-moved timestamp. If distance between consecutive points is below threshold (e.g. 50 meters) and elapsed time exceeds 5 minutes, flag as stationary. Discuss Haversine distance calculation, GPS drift noise, and how to handle signal gaps.
Maintain a deque of size 10. For each completed trip, push actual_duration minus predicted_duration, pop oldest if full. Return mean of deque. Discuss what ETA error distribution reveals about model accuracy and whether to use absolute or signed error.
Ingest ride requests and driver availability via Kafka. Flink computes supply/demand ratio per geospatial zone in tumbling windows. Serve from a low-latency key-value store. Discuss zone granularity (H3 hexagons), smoothing to avoid price oscillation, and fallback when the streaming layer lags behind.
Kappa architecture: Kafka for real-time, Flink for stream processing, Hudi for incremental updates to the data lake. Discuss exactly-once semantics, late-arriving events from mobile clients, and partition strategy (by city and date). Explain how Hudi's upsert model handles trip state updates as the trip progresses.
Drivers publish location to Kafka. Flink maintains a geospatial index (H3) of available drivers in state. On ride request, query the index for drivers in adjacent hexagons, rank by ETA and rating, and dispatch. Discuss rebalancing when supply is sparse and how to handle simultaneous requests for the same driver.
Fact: trips (request_ts, accept_ts, pickup_ts, dropoff_ts, fare, surge_multiplier, zone_id). Dimensions: zones (with H3 hierarchy), drivers, riders. Discuss pre-aggregating zone-level metrics hourly and the difference between completed trips and requested trips for funnel analysis.
Fact: orders with timestamps for each state transition (placed, confirmed, preparing, picked_up, delivered). Dimensions: restaurants, couriers, customers, cities. Discuss SCD Type 2 for restaurant attributes that change (menu, hours) and how to compute preparation time vs delivery time separately.
Uber operates 24/7 with real-time financial impact. Describe the incident, the options you considered, what you chose and why, and the outcome. Show you can balance speed with safety and communicate clearly during high-stress situations.
Four areas that separate prepared candidates from everyone else.
Most Uber DE questions are framed around real-time or near-real-time requirements. Batch processing is secondary. Know Kafka, Flink, and streaming concepts: watermarks, windowing, exactly-once delivery, and backpressure.
Uber partitions data geographically using H3 hexagonal indexing. Understand geohashing, spatial joins, and how to partition and query location-based data efficiently. This comes up in both system design and data modeling rounds.
Uber created Apache Hudi (incremental data processing), AresDB (real-time analytics), and Cadence (workflow orchestration). Mentioning these tools and understanding their purpose shows deep familiarity with Uber's data ecosystem.
Uber processes millions of events per second across rides, deliveries, and driver locations. When discussing system design, think in terms of throughput (events/sec), latency (p99 in milliseconds), and geographic distribution across hundreds of cities.
Patterns that cost candidates offers. These are specific to Uber and come from the unique characteristics of their data infrastructure.
Candidates propose nightly Spark jobs for problems that demand sub-second latency. At Uber, surge pricing, driver matching, and ETA updates all require streaming. If the interviewer describes a real-time scenario, your first instinct should be Kafka plus Flink, not Airflow plus Spark.
Uber data is inherently spatial. Candidates who partition only by date miss the point. Most Uber tables are partitioned by city or H3 hex zone first, then by time. Forgetting this leads to full table scans and shows you have not thought about how Uber's data is actually structured.
Mobile clients send events over unreliable networks. GPS pings arrive late. Trip end events sometimes arrive before trip start events. Candidates who assume ordered data get caught when the interviewer asks about late arrivals. Always discuss watermarks, event-time processing, and how to handle out-of-order data.
Uber operates in hundreds of cities with different regulations, currencies, and demand patterns. A system designed as a single global pipeline will not work. Interviewers expect you to discuss per-city or per-region isolation, failover, and how to prevent a problem in one city from affecting another.
Uber runs a hybrid on-prem and cloud infrastructure. Candidates who propose expensive fully-managed cloud services without discussing cost tradeoffs miss the mark. Mention compute costs, storage tiering, and how to handle peak vs off-peak workloads efficiently.
Uber DE questions emphasize real-time systems, geospatial data, and streaming architecture. Practice with problems that test the same patterns Uber interviewers use.
Practice Uber-Level ProblemsContinue your prep
50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 921 companies, collected from real candidates.