Company Interview Guide

Uber Data Engineer Interview Questions and Guide

Story from last hiring cycle. Strong L5 candidate, ex-Stripe, walks into Uber's SQL round and blanks on a LAG over partitioned trip data. Three minutes of dead air. The interviewer asked it a second way and he still couldn't land the window frame. Offer came back an L4 downlevel. $80K gone over one function he'd never practiced under a clock.

Uber's SQL bar is the highest you'll hit in any DE interview. Real-time trip data, driver/rider joins, surge windows, all of it under time pressure. Nothing philosophical. Nothing contrived. The loop exposes weak SQL fast, and there's nowhere to hide behind soft answers. This guide covers the full loop, the domain-specific question shapes, and the exact prep that passes.

6

Loop stages

21%

PARTITION BY use

15%

ROW_NUMBER use

60%

Prep time on SQL

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

The Uber DE Interview Loop

Six stages. Onsite is a full day, four rounds back-to-back. The thing nobody tells you: every question maps to a real pager incident someone on the team saw last month. Surge pricing broken at rush hour. Trip deduplication blowing up BI dashboards. Supply-demand window job missing a watermark. The rounds are blunt because the job is blunt.

1

Recruiter Screen

30 min

The recruiter walks through your background and assesses fit for the role. They'll ask about your experience with data pipelines, SQL, and real-time systems. Uber has multiple DE teams: Marketplace (pricing, matching, surge), Payments, Safety, Maps, Eats, and Freight. Each team has different technical needs. The recruiter tries to match you to a team based on your background. They'll also explain the interview timeline and share a high-level description of what to expect in each round. Pay attention to which team they suggest. Uber's Marketplace team, for example, values real-time streaming experience much more than the Payments team.

*Research which Uber team you're most interested in and tell the recruiter. Targeted enthusiasm is more compelling than generic interest
*Mention specific experience with real-time data if you have it. Uber's data infrastructure is heavily streaming-oriented (Kafka, Flink, real-time feature stores)
*Ask about the team's tech stack and biggest current challenges. This helps you tailor your prep for the remaining rounds
2

Technical Phone Screen

60 min

A video call with a senior data engineer. The format is typically 40 minutes of SQL (2 to 3 problems) followed by 15 to 20 minutes of discussion about data modeling or pipeline architecture. Uber's phone screen is SQL-heavy and harder than most companies' phone screens. The problems use Uber-like schemas: trips, drivers, riders, surge pricing, and geospatial data. Expect window functions, self-joins, and multi-step problems where the output of one query feeds into the next. The interviewer also asks you to optimize your query and explain what indexes would help. The remaining time covers a mini system design question or a discussion about your past projects.

*Uber schemas involve timestamps everywhere. Practice time-based aggregations: trips per hour, average wait time by city, surge pricing duration
*Be ready to discuss query optimization. Uber processes billions of rows daily. If your query does a full table scan, the interviewer will push back
*The phone screen is a strong filter. If you pass this round, you're likely to make it to the onsite. Treat it as seriously as an onsite round
3

Onsite: SQL Deep Dive

60 min

The most technically demanding SQL round in the loop. Three problems, each harder than the last, all set in Uber's domain. Typical topics: calculating driver utilization rates (time spent with passengers vs. idle), identifying surge pricing anomalies, computing rider retention cohorts, or finding trips where the actual route diverged significantly from the estimated route. The interviewer expects clean, correct SQL with clear explanations. After each problem, they'll ask follow-up questions about performance: partitioning strategy, join order, and how the query behaves at Uber's scale (millions of trips per day, hundreds of cities). This round is Uber's signature. They take SQL more seriously than almost any other company.

*Practice with ride-sharing schemas. Uber's domain is very specific: trips have pickup/dropoff locations, timestamps, surge multipliers, driver ratings, and cancellation reasons
*Know your window functions cold. LEAD, LAG, ROW_NUMBER, RANK, NTILE, running sums, and moving averages all appear regularly
*When the interviewer asks about optimization, think about partitioning by city and date, indexing on (driver_id, trip_date), and avoiding correlated subqueries on large tables
4

Onsite: System Design / Pipeline Architecture

60 min

Design a data pipeline or platform for an Uber use case. Common prompts: real-time surge pricing analytics, driver supply/demand prediction pipeline, a data quality system for trip data, or a feature store for ML models that predict ETA. Uber's data infrastructure is built around Apache Kafka, Apache Flink, Apache Hive, Presto, and their internal data lake (built on HDFS and later migrated to object storage). The interviewer expects you to reason about real-time vs. batch trade-offs, handle late-arriving data, and design for Uber's multi-city, multi-region architecture. You should discuss data freshness SLAs, failure recovery, and monitoring. Uber values engineers who think about operational reliability, not just the happy path.

*Start with requirements: latency SLA, data volume per city, how many cities, consumer types (dashboards vs. ML models vs. APIs)
*Uber's data is inherently geospatial and temporal. Mention how you'd partition data by city/region and time
*If you mention Kafka, explain how you'd handle exactly-once semantics, consumer lag monitoring, and schema evolution (Avro/Protobuf)
*Discuss the Lambda or Kappa architecture trade-off. Uber has historically used Lambda (separate batch and streaming layers) but is moving toward unified streaming
5

Onsite: Coding (Python or Java)

45 to 60 min

A data processing problem in Python or Java. Unlike software engineering interviews, Uber DE coding rounds focus on data manipulation rather than algorithms. You might parse and transform ride event logs, implement a simple sessionization algorithm, build a deduplication function for streaming events, or write a pipeline step that enriches trip data with geospatial lookups. The interviewer evaluates code quality, correctness, and your ability to reason about edge cases. After you write the initial solution, they'll extend the problem: 'Now this function needs to process 1 million events per second. What changes?' The discussion about scaling your solution is as important as the initial code.

*Write clean, well-structured Python. Use type hints, meaningful variable names, and helper functions
*Think about edge cases upfront: null fields, out-of-order events, duplicate records, and timezone issues
*When discussing scale, mention generators/iterators for memory efficiency, batching, and how you'd parallelize the work
6

Onsite: Behavioral / Culture Fit

45 min

A round focused on how you work with others, handle conflict, and operate in a fast-paced environment. Uber's culture values speed, ownership, and impact. The interviewer asks about past situations: 'Tell me about a time you shipped something quickly that wasn't perfect, and how you handled the consequences,' 'Describe a disagreement with a stakeholder about data quality,' or 'How have you handled an on-call incident for a pipeline you built?' Uber wants engineers who take ownership of their systems, communicate clearly with non-technical stakeholders, and can move fast without breaking critical data. This round is not a formality. A weak behavioral performance can result in a no-hire even if technical rounds were strong.

*Prepare stories about shipping quickly and iterating. Uber's culture favors speed over perfection as long as you have monitoring and rollback plans
*Have a story about a data pipeline failure you owned and resolved. Uber values on-call ownership
*Be ready to discuss cross-functional collaboration. DEs at Uber work closely with data scientists, product managers, and city operations teams

5 Real-Style Uber DE Interview Questions

These reflect Uber's domain, technical depth, and the mix of SQL, system design, coding, and behavioral questions you'll face.

SQL

Given a trips table (trip_id, driver_id, rider_id, city, request_time, pickup_time, dropoff_time, fare, surge_multiplier), calculate the average driver utilization rate per city for the last 7 days. Utilization = time spent on trips / total online time.

Join trips to a driver_sessions table (or derive online time from trip gaps). Sum trip durations (dropoff_time minus pickup_time) per driver per city. Calculate total online time per driver (from login/logout events or the span from first to last trip with gap-based sessionization). Divide trip time by online time. Average across drivers per city. The interviewer will probe how you handle drivers who work in multiple cities, trips that span midnight, and the definition of 'online time' when a driver has no trips for 2 hours mid-shift.

SQL

Find all riders whose trip cancellation rate increased month-over-month for 3 consecutive months. Include the rider_id and the months.

Aggregate cancellations and total trip requests per rider per month. Calculate cancellation rate (cancelled / total). Use LAG to compare each month to the prior month. Flag months where the rate increased. Apply the consecutive-group technique (row_number minus month_number) to detect 3-month increasing streaks. The follow-up question will be about how you handle riders with very few trips (is a 1-trip, 1-cancellation month meaningful?) and whether you set a minimum trip threshold.

System Design

Design a real-time pipeline that calculates and serves dynamic surge pricing multipliers for each city zone, updating every 30 seconds based on current supply (available drivers) and demand (ride requests).

Ingest ride request events and driver location/status events into Kafka topics. A Flink streaming job consumes both topics, windows by 30-second tumbling windows, and computes supply/demand ratios per geohash (city zone). The resulting surge multipliers are written to a low-latency key-value store (Redis or DynamoDB) that the pricing API reads from. Discuss how you handle zones with very few events (smoothing), how you prevent surge multiplier oscillation (dampening), and what happens when the Flink job restarts mid-window (checkpointing). Store historical surge data in a data lake for analytics and model training.

Coding (Python)

Write a function that takes a stream of driver location pings (driver_id, lat, lng, timestamp) and detects drivers who appear to be spoofing their location. A spoof is detected when a driver's location jumps more than 5 km in under 30 seconds.

Maintain a dictionary of last-known locations per driver. For each incoming ping, compute the haversine distance between the current and previous location. If the distance exceeds 5 km and the time delta is under 30 seconds, flag as a potential spoof. Handle edge cases: the first ping for a driver (no previous location), pings that arrive out of order, and GPS jitter in tunnels or urban canyons that can cause legitimate but large apparent jumps. The interviewer will ask how you'd tune the threshold and whether you'd use a sliding window of pings instead of just the last one.

Behavioral

Tell me about a time you had to deliver data to stakeholders on a tight deadline, knowing the data wasn't perfect. How did you handle the trade-off?

Use STAR format. Describe the context: what was the deadline, who needed the data, what was imperfect about it (missing records, known data quality issues, incomplete coverage). Explain the trade-off you evaluated: delay delivery to fix the issue vs. deliver with known caveats. Describe your decision and how you communicated the limitations to stakeholders. Show that you didn't just ship bad data silently, but documented the known issues and set expectations. Uber values speed but also trust; the strongest answer demonstrates both.

Preparation Strategy

How to focus your prep time for an Uber DE loop.

SQL is your highest-ROI prep area

SQL accounts for the majority of the Uber DE interview. Spend 60% of your prep time on SQL. Focus on ride-sharing schemas: trips, drivers, riders, cities, surge multipliers, and ratings. Practice time-based aggregations, window functions (LEAD, LAG, ROW_NUMBER, running sums), self-joins, and optimization discussions. Do 3 to 5 timed SQL problems per day for 2 to 3 weeks.

Study streaming architecture fundamentals

You don't need production Kafka experience, but you should understand: topics, partitions, consumer groups, offsets, exactly-once semantics, and how Flink/Spark Streaming processes data in windows (tumbling, sliding, session). Study one end-to-end streaming pipeline design and be ready to adapt it to Uber's use cases.

Build Uber-domain system design answers

Prepare 3 to 4 system design answers in Uber's domain: real-time surge pricing pipeline, driver/rider matching data flow, trip analytics platform, and a data quality monitoring system. For each, know the data sources, processing layer, storage choices, serving layer, and monitoring strategy.

Prepare behavioral stories about ownership and speed

Uber values engineers who ship fast and own their systems. Prepare stories about: a time you shipped under a tight deadline, a pipeline failure you owned end-to-end, a disagreement with a stakeholder, and a project where you simplified a complex system. Keep each story under 3 minutes using STAR format.

Uber DE Interview FAQ

How SQL-heavy is the Uber DE interview?+
Very. SQL is Uber's primary filter for data engineers. You'll face SQL in the phone screen (2 to 3 problems) and a dedicated onsite SQL deep dive (3 problems, 60 minutes). The system design round may also involve SQL for the serving layer. In total, you'll write more SQL in an Uber DE loop than at almost any other company. The problems are set in Uber's domain (trips, drivers, riders, surge, geospatial data) and emphasize window functions, time-based aggregations, and query optimization.
Does Uber ask about Kafka and streaming in every DE interview?+
Not every round, but the system design round almost always involves a real-time component. Uber's data infrastructure is built around Kafka and Flink, so familiarity with streaming concepts (event ordering, exactly-once delivery, consumer groups, windowing) is expected for senior roles. If you've never worked with streaming, study the fundamentals before the interview. You don't need production Kafka experience, but you should be able to discuss streaming architecture at a conceptual level.
What level does Uber hire data engineers at?+
Uber hires DEs from IC3 (entry level, 1 to 3 years) through IC6 (staff, 10+ years). Most external hires come in at IC4 (mid-level, 3 to 6 years) or IC5 (senior, 6 to 10 years). IC5+ candidates always have a system design round. IC3 candidates may have an extra coding round instead. The level determines your compensation band and the scope of projects you'll own.

Your SQL Has to Be Muscle Memory By Then

Seen too many good engineers lose Uber offers to window function hesitation. Run the reps until LAG and LEAD feel cheap.

Start Practicing