Company Interview Guide
Uber processes millions of trips and deliveries daily across hundreds of cities, generating massive volumes of real-time geospatial and transactional data. Their DE interviews test streaming architecture, geospatial reasoning, and the ability to build systems that operate at low latency under constant load. Here is what each round covers.
Three stages from first contact to offer.
Initial call covering your experience and interest in Uber. The recruiter assesses your background with real-time data systems, large-scale infrastructure, and streaming architectures. Uber operates a massive real-time platform processing millions of rides and deliveries daily, so they look for candidates comfortable with event-driven systems and low-latency requirements.
One to two coding problems, typically SQL or Python. Uber phone screens test data manipulation with ride and delivery event data. Expect questions about time-series analysis, geospatial logic, and event processing. The interviewer evaluates both correctness and your ability to reason about scale.
Four to five rounds covering system design, SQL deep dive, coding, data modeling, and behavioral. System design at Uber focuses on real-time architectures: surge pricing computation, ETA prediction pipelines, and marketplace matching. The data modeling round often involves designing schemas for trip data that support both real-time operations and historical analytics.
Real question types from each round. The guidance shows what the interviewer looks for.
Join ride_requests to ride_acceptances on ride_id. Compute wait_time = acceptance_ts - request_ts. AVG grouped by city and EXTRACT(HOUR FROM request_ts). Discuss handling rides that were never accepted.
Aggregate trips by driver and date. HAVING COUNT >= 20 AND AVG(rating) < 4.0. Discuss whether to include trips with no rating, and what this pattern might indicate about driver fatigue.
Use the islands-and-gaps technique: ROW_NUMBER minus a sequence to group consecutive rows, then filter groups spanning more than 15 minutes. Discuss event granularity and how to handle missing data points.
Track last-moved timestamp. If distance between consecutive points is below threshold (e.g. 50 meters) and elapsed time exceeds 5 minutes, flag as stationary. Discuss Haversine distance and GPS drift.
Ingest ride requests and driver availability via Kafka. Flink computes supply/demand ratio per geospatial zone in tumbling windows. Serve from a low-latency key-value store. Discuss zone granularity (H3 hexagons), smoothing to avoid price oscillation, and fallback when streaming lags.
Lambda or Kappa architecture: Kafka for real-time, Spark for batch reprocessing, Hudi for incremental updates to the lake. Discuss exactly-once semantics, late-arriving events from mobile clients, and partition strategy (by city and date).
Fact: trips (request_ts, accept_ts, pickup_ts, dropoff_ts, fare, surge_multiplier, zone_id). Dimensions: zones (with H3 hierarchy), drivers, riders. Discuss pre-aggregating zone-level metrics hourly and the difference between completed trips and requested trips.
Uber operates 24/7 with real-time financial impact. Describe the incident, the options you considered, what you chose and why, and the outcome. Show you can balance speed with safety.
What makes Uber different from other companies.
Most Uber DE questions are framed around real-time or near-real-time requirements. Batch processing is secondary. Know Kafka, Flink, and streaming concepts: watermarks, windowing, exactly-once delivery, and backpressure.
Uber partitions data geographically using H3 hexagonal indexing. Understand geohashing, spatial joins, and how to partition and query location-based data efficiently. This comes up in both system design and data modeling.
Uber created Apache Hudi (incremental data processing), AresDB (real-time analytics), and Cadence (workflow orchestration). Mentioning these tools and understanding their purpose shows deep familiarity with Uber's data ecosystem.
Uber processes millions of events per second across rides, deliveries, and driver locations. When discussing system design, think in terms of throughput (events/sec), latency (p99 in milliseconds), and geographic distribution across hundreds of cities.
Uber DE questions emphasize real-time systems and geospatial data. Practice with problems that test streaming logic and scale.
Practice Uber-Level Problems