Story from last hiring cycle. Strong L5 candidate, ex-Stripe, walks into Uber's SQL round and blanks on a LAG over partitioned trip data. Three minutes of dead air. The interviewer asked it a second way and he still couldn't land the window frame. Offer came back an L4 downlevel. $80K gone over one function he'd never practiced under a clock.
Uber's SQL bar is the highest you'll hit in any DE interview. Real-time trip data, driver/rider joins, surge windows, all of it under time pressure. Nothing philosophical. Nothing contrived. The loop exposes weak SQL fast, and there's nowhere to hide behind soft answers. This guide covers the full loop, the domain-specific question shapes, and the exact prep that passes.
Loop stages
PARTITION BY use
ROW_NUMBER use
Prep time on SQL
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
Six stages. Onsite is a full day, four rounds back-to-back. The thing nobody tells you: every question maps to a real pager incident someone on the team saw last month. Surge pricing broken at rush hour. Trip deduplication blowing up BI dashboards. Supply-demand window job missing a watermark. The rounds are blunt because the job is blunt.
The recruiter walks through your background and assesses fit for the role. They'll ask about your experience with data pipelines, SQL, and real-time systems. Uber has multiple DE teams: Marketplace (pricing, matching, surge), Payments, Safety, Maps, Eats, and Freight. Each team has different technical needs. The recruiter tries to match you to a team based on your background. They'll also explain the interview timeline and share a high-level description of what to expect in each round. Pay attention to which team they suggest. Uber's Marketplace team, for example, values real-time streaming experience much more than the Payments team.
A video call with a senior data engineer. The format is typically 40 minutes of SQL (2 to 3 problems) followed by 15 to 20 minutes of discussion about data modeling or pipeline architecture. Uber's phone screen is SQL-heavy and harder than most companies' phone screens. The problems use Uber-like schemas: trips, drivers, riders, surge pricing, and geospatial data. Expect window functions, self-joins, and multi-step problems where the output of one query feeds into the next. The interviewer also asks you to optimize your query and explain what indexes would help. The remaining time covers a mini system design question or a discussion about your past projects.
The most technically demanding SQL round in the loop. Three problems, each harder than the last, all set in Uber's domain. Typical topics: calculating driver utilization rates (time spent with passengers vs. idle), identifying surge pricing anomalies, computing rider retention cohorts, or finding trips where the actual route diverged significantly from the estimated route. The interviewer expects clean, correct SQL with clear explanations. After each problem, they'll ask follow-up questions about performance: partitioning strategy, join order, and how the query behaves at Uber's scale (millions of trips per day, hundreds of cities). This round is Uber's signature. They take SQL more seriously than almost any other company.
Design a data pipeline or platform for an Uber use case. Common prompts: real-time surge pricing analytics, driver supply/demand prediction pipeline, a data quality system for trip data, or a feature store for ML models that predict ETA. Uber's data infrastructure is built around Apache Kafka, Apache Flink, Apache Hive, Presto, and their internal data lake (built on HDFS and later migrated to object storage). The interviewer expects you to reason about real-time vs. batch trade-offs, handle late-arriving data, and design for Uber's multi-city, multi-region architecture. You should discuss data freshness SLAs, failure recovery, and monitoring. Uber values engineers who think about operational reliability, not just the happy path.
A data processing problem in Python or Java. Unlike software engineering interviews, Uber DE coding rounds focus on data manipulation rather than algorithms. You might parse and transform ride event logs, implement a simple sessionization algorithm, build a deduplication function for streaming events, or write a pipeline step that enriches trip data with geospatial lookups. The interviewer evaluates code quality, correctness, and your ability to reason about edge cases. After you write the initial solution, they'll extend the problem: 'Now this function needs to process 1 million events per second. What changes?' The discussion about scaling your solution is as important as the initial code.
A round focused on how you work with others, handle conflict, and operate in a fast-paced environment. Uber's culture values speed, ownership, and impact. The interviewer asks about past situations: 'Tell me about a time you shipped something quickly that wasn't perfect, and how you handled the consequences,' 'Describe a disagreement with a stakeholder about data quality,' or 'How have you handled an on-call incident for a pipeline you built?' Uber wants engineers who take ownership of their systems, communicate clearly with non-technical stakeholders, and can move fast without breaking critical data. This round is not a formality. A weak behavioral performance can result in a no-hire even if technical rounds were strong.
These reflect Uber's domain, technical depth, and the mix of SQL, system design, coding, and behavioral questions you'll face.
Join trips to a driver_sessions table (or derive online time from trip gaps). Sum trip durations (dropoff_time minus pickup_time) per driver per city. Calculate total online time per driver (from login/logout events or the span from first to last trip with gap-based sessionization). Divide trip time by online time. Average across drivers per city. The interviewer will probe how you handle drivers who work in multiple cities, trips that span midnight, and the definition of 'online time' when a driver has no trips for 2 hours mid-shift.
Aggregate cancellations and total trip requests per rider per month. Calculate cancellation rate (cancelled / total). Use LAG to compare each month to the prior month. Flag months where the rate increased. Apply the consecutive-group technique (row_number minus month_number) to detect 3-month increasing streaks. The follow-up question will be about how you handle riders with very few trips (is a 1-trip, 1-cancellation month meaningful?) and whether you set a minimum trip threshold.
Ingest ride request events and driver location/status events into Kafka topics. A Flink streaming job consumes both topics, windows by 30-second tumbling windows, and computes supply/demand ratios per geohash (city zone). The resulting surge multipliers are written to a low-latency key-value store (Redis or DynamoDB) that the pricing API reads from. Discuss how you handle zones with very few events (smoothing), how you prevent surge multiplier oscillation (dampening), and what happens when the Flink job restarts mid-window (checkpointing). Store historical surge data in a data lake for analytics and model training.
Maintain a dictionary of last-known locations per driver. For each incoming ping, compute the haversine distance between the current and previous location. If the distance exceeds 5 km and the time delta is under 30 seconds, flag as a potential spoof. Handle edge cases: the first ping for a driver (no previous location), pings that arrive out of order, and GPS jitter in tunnels or urban canyons that can cause legitimate but large apparent jumps. The interviewer will ask how you'd tune the threshold and whether you'd use a sliding window of pings instead of just the last one.
Use STAR format. Describe the context: what was the deadline, who needed the data, what was imperfect about it (missing records, known data quality issues, incomplete coverage). Explain the trade-off you evaluated: delay delivery to fix the issue vs. deliver with known caveats. Describe your decision and how you communicated the limitations to stakeholders. Show that you didn't just ship bad data silently, but documented the known issues and set expectations. Uber values speed but also trust; the strongest answer demonstrates both.
How to focus your prep time for an Uber DE loop.
SQL accounts for the majority of the Uber DE interview. Spend 60% of your prep time on SQL. Focus on ride-sharing schemas: trips, drivers, riders, cities, surge multipliers, and ratings. Practice time-based aggregations, window functions (LEAD, LAG, ROW_NUMBER, running sums), self-joins, and optimization discussions. Do 3 to 5 timed SQL problems per day for 2 to 3 weeks.
You don't need production Kafka experience, but you should understand: topics, partitions, consumer groups, offsets, exactly-once semantics, and how Flink/Spark Streaming processes data in windows (tumbling, sliding, session). Study one end-to-end streaming pipeline design and be ready to adapt it to Uber's use cases.
Prepare 3 to 4 system design answers in Uber's domain: real-time surge pricing pipeline, driver/rider matching data flow, trip analytics platform, and a data quality monitoring system. For each, know the data sources, processing layer, storage choices, serving layer, and monitoring strategy.
Uber values engineers who ship fast and own their systems. Prepare stories about: a time you shipped under a tight deadline, a pipeline failure you owned end-to-end, a disagreement with a stakeholder, and a project where you simplified a complex system. Keep each story under 3 minutes using STAR format.
Seen too many good engineers lose Uber offers to window function hesitation. Run the reps until LAG and LEAD feel cheap.
Start Practicing