Company Interview Guide

DoorDash Data Engineer Interview

DoorDash operates one of the most complex three-sided marketplaces in tech: dashers, merchants, and consumers all generate data that must be reconciled in near real-time. Their Data Engineer loop reflects this complexity. Expect questions about modeling three-party events, designing dispatch pipelines, and handling delivery state machines. DoorDash loops run 4 to 5 weeks. Pair with the data engineer interview prep guide.

The Short Answer
Expect a 5 to 6 round loop: recruiter screen, technical phone screen (SQL or Python), then a 4-round virtual onsite covering system design (often dispatch or delivery state machine), SQL, Python, and a behavioral round. DoorDash's distinctive emphasis: modeling three-party events where dasher, merchant, and consumer states all mutate independently. The collaboration round leans on stories about handling competing stakeholder priorities.
Updated April 2026·By The DataDriven Team

DoorDash Data Engineer Interview Process

5 to 6 rounds, 4 to 5 weeks end to end. Mostly virtual.

1

Recruiter Screen (30 min)

Conversational. DoorDash hires across Logistics, Marketplace, Merchant Platform, Dasher Platform, Consumer Growth, Financial Data, and ML Platform teams. Mention experience with multi-party data, real-time systems, or delivery/logistics if you have it.
2

Technical Phone Screen (60 min)

Live SQL or Python in CoderPad. SQL leans on multi-party joins (dashers + merchants + orders) and time-series window functions. Python leans on processing event streams from multiple sources with state machine logic.
3

System Design Round (60 min)

A delivery-relevant problem. Common: design the dispatch pipeline that matches orders to dashers, design the delivery state machine event log, design the merchant menu sync pipeline. Use the 4-step framework. Cover real-time matching, exactly-once delivery state transitions, and the source-of-truth question.
4

Live Coding Onsite (60 min)

Second live coding round, opposite language from phone screen. Often includes a follow-up that adds three-party event reconciliation.
5

Modeling Round (60 min)

Sometimes its own round, sometimes embedded in system design. Design schemas for delivery events, dasher shifts, or merchant catalog. SCD Type 2 expected on at least one dimension. Discuss late-arriving events.
6

Behavioral / Collaboration Round (60 min)

STAR-D format. DoorDash emphasizes stakeholder management; expect stories about handling competing requests from product, ops, and finance teams. Decision postmortem heavily weighted.

DoorDash Data Engineer Compensation (2026)

Total comp ranges from levels.fyi and verified offers. US-based.

LevelTitleRangeNotes
IC2Data Engineer$190K - $280K2-4 years exp. Owns individual pipelines, on-call rotation.
IC3Senior Data Engineer$260K - $400KMost common hiring level. Cross-team systems, architecture decisions.
IC4Staff Data Engineer$370K - $560KSets technical direction for a domain. Cross-org influence.
IC5Senior Staff Data Engineer$480K - $720KMulti-org technical leadership. Internal promotion typical.

DoorDash Data Engineering Tech Stack

Languages

Python, Kotlin, Go, Scala, SQL

Processing

Apache Spark, Apache Flink, custom batch frameworks

Storage

S3, Snowflake, Iceberg, Cassandra for live serving

Streaming

Apache Kafka (heavy use), AWS Kinesis for some pipelines

Query Engines

Snowflake, Apache Druid for real-time, Trino for ad-hoc

Orchestration

Dagster (migrating from Airflow), Argo for K8s workflows

ML Platform

Custom feature store, TensorFlow, PyTorch for matching/dispatch models

Logistics

Custom routing engine, OSRM for road network, H3 for geospatial bucketing

15 Real DoorDash Data Engineer Interview Questions With Worked Answers

Questions reported by candidates in 2024-2026 loops, paraphrased and de-identified. Each answer covers the approach, the gotcha, and the typical follow-up.

SQL · L4

Compute dasher acceptance rate per market per day

Acceptance rate = orders_accepted / orders_offered, grouped by market and day. The gotcha is attribution: dashers active across multiple markets in the same day need a rule (assign to first market of the day, or split proportionally by time spent). Volunteer the assumption. Follow-up: how do you handle markets with very low order volume that produce noisy rates? Answer: filter to markets with at least N offers per day, or report a Wilson score interval instead of point estimate.
SELECT
  market_id,
  date_trunc('day', offered_ts) AS day,
  COUNT(*) FILTER (WHERE accepted) * 1.0 / COUNT(*) AS acceptance_rate,
  COUNT(*) AS offer_count
FROM dasher_offers
GROUP BY market_id, day
HAVING COUNT(*) >= 50  -- noise filter
ORDER BY day, market_id;
SQL · L4

Find merchants with order volume drop greater than 20% week-over-week

Aggregate weekly orders per merchant. LAG to get prior week. Compute (current - prior) / NULLIF(prior, 0). NULLIF prevents division by zero on new merchants. Filter where abs(change_pct) > 0.2. The interviewer's follow-up: what about seasonality? Answer: compare to the same week prior year (LAG with offset 52), or use a moving baseline (4-week trailing average). The naive WoW comparison flags every Thanksgiving and Christmas as a drop.
SQL · L4

Top 5 cities by 7-day rolling delivery time degradation

Compute rolling 7-day average delivery_time per city. LAG to compare to the previous 7-day window. Order by absolute degradation desc, limit 5. Volunteer the confounders: weather events (rain spikes delivery time), local events (concerts, sports games), driver supply shocks. Senior candidates suggest an explainability layer that attributes the degradation to known causes.
SQL · L5

Compute supply / demand ratio per H3 cell per 5-min window

Two source tables: dasher_availability (per dasher per minute, with H3 cell) and order_creation (per order, with H3 pickup cell). Aggregate to (h3_cell, 5_min_bucket). Compute supply (count distinct dashers) and demand (count orders). Ratio = supply / demand. Edge case: cells with zero supply produce infinity; clamp or report separately. This query is the input to the surge engine; mention that context.
SQL · L5

Detect canceled orders that should have been refunded but weren't

LEFT JOIN order_cancellations to refund_events on order_id. WHERE refund_event IS NULL AND cancellation_ts < NOW() - INTERVAL '1 hour'. Discuss why the time buffer (refunds are async, may take minutes), and why this query is the right one to alert on (financial impact, customer experience). Follow-up: how do you operationalize this? Answer: scheduled query every 15 min, output to alert topic, downstream service files refund automatically or pages support.
Python · L4

Sessionize dasher shifts with 15-min idle gap

Sort events by (dasher_id, ts). Walk events. Increment shift_id when gap > 15 min OR dasher state transitions from Active to Inactive. State assumption: events with same ts are consecutive (use stable sort). Edge case: a shift that crosses midnight should be one shift, not split on date boundary. Volunteer this. Follow-up: how do you handle a clock-skewed event with ts in the future? Answer: drop or quarantine to dead-letter for review.
Python · L4

Implement state machine for delivery lifecycle

States: Created -> Assigned -> PickedUp -> DroppedOff -> Completed, with parallel paths for Cancelled and Refunded. Validate transitions: from Created, only Assigned or Cancelled is valid. From Assigned, only PickedUp or Cancelled. Reject invalid (e.g., Created directly to DroppedOff). Track timestamp per state. Discuss the late-arriving Cancellation event problem: if Cancellation arrives after PickedUp, classify as “cancelled-in-progress” and route to manual review.
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import Optional

class DeliveryState(Enum):
    CREATED = "created"
    ASSIGNED = "assigned"
    PICKED_UP = "picked_up"
    DROPPED_OFF = "dropped_off"
    COMPLETED = "completed"
    CANCELLED = "cancelled"

VALID_TRANSITIONS = {
    DeliveryState.CREATED: {DeliveryState.ASSIGNED, DeliveryState.CANCELLED},
    DeliveryState.ASSIGNED: {DeliveryState.PICKED_UP, DeliveryState.CANCELLED},
    DeliveryState.PICKED_UP: {DeliveryState.DROPPED_OFF, DeliveryState.CANCELLED},
    DeliveryState.DROPPED_OFF: {DeliveryState.COMPLETED},
    DeliveryState.COMPLETED: set(),
    DeliveryState.CANCELLED: set(),
}

@dataclass
class Delivery:
    delivery_id: str
    state: DeliveryState = DeliveryState.CREATED
    state_history: list[tuple[DeliveryState, datetime]] = field(default_factory=list)

    def transition(self, new_state: DeliveryState, ts: datetime) -> bool:
        if new_state not in VALID_TRANSITIONS[self.state]:
            return False  # caller routes to dead-letter
        self.state_history.append((self.state, ts))
        self.state = new_state
        return True
Python · L5

Match dashers to orders using H3 ring search

Naive O(n*m) pair-distance is wrong at scale. Better: bucket dashers by H3 cell. For each order, search the pickup cell + ring(1) + ring(2) for available dashers. Compute Haversine for the candidate set (typically <50 dashers vs total fleet of 100K). Discuss the ring radius trade-off: too small misses dashers just outside the cell, too large is wasteful.
Python · L5

Detect duplicate orders within a 60-second window per consumer

Common scenario: app retry creates two orders. Walk events per consumer, compare each new order to recent orders for same merchant + similar cart hash. Within 60 seconds and >90% cart similarity = duplicate. Edge case: legitimate re-order minutes apart (gym member + family lunch); avoid false positives. Discuss the threshold tuning trade-off.
System Design · L5

Design the order-to-dasher dispatch pipeline

Order created event -> Kafka -> dispatch service reads, queries available dashers within H3 ring (Cassandra by cell, kept warm by per-cell index), runs matching algorithm, emits assignment event back to Kafka. Dasher app consumes assignment topic. Cover: exactly-once assignment via deduplication of assignment_id, dasher-side accept/reject (timeout returns order to dispatch pool), batching multiple orders to one dasher when efficient, surge pricing trigger when supply low (writes to surge topic). Failure modes: dispatch service crash (Kafka redelivers, dedup ensures idempotency), dasher offline mid-pickup (timeout returns order, customer notified).
System Design · L5

Design the delivery event sourcing pipeline

Every state transition is an immutable event written to a partitioned Kafka topic (key = delivery_id). Materialized view of current delivery state in Cassandra for live operations. Replay capability for late events: a Cancellation event that arrives 30 minutes late triggers a state recomputation for that delivery, downstream effects (refund, dasher unblock) are emitted. Audit log immutable on S3, partitioned by date, for finance reconciliation. Discuss event ordering: producer-side ordering by ts, consumer-side reordering by delivery_id key partition.
System Design · L5

Design the merchant menu sync pipeline

Merchant POS systems push menu updates via webhook to ingest service -> Kafka menu_updates topic -> Flink validation + transformation (normalize prices, modifiers, categories) -> Snowflake (analytics) + Cassandra (live serving for consumer search). Cover: schema variability per POS vendor (Toast, Square, Clover), partial-update handling (price changes vs full menu refresh), search-index refresh (debounced, batched per merchant per hour). Discuss eventual consistency: a price change takes <5 min to appear to consumers, which is acceptable.
Modeling · L5

Design schema for three-party event tracking (dasher + merchant + consumer)

Two valid approaches. Single fact table fact_delivery_event with party_role column (Dasher / Merchant / Consumer), event_type, ts, lat/lon, payload (JSONB). Pro: single table to query for full delivery story. Con: party-specific attributes hidden in JSONB. Alternative: three fact tables (fact_dasher_event, fact_merchant_event, fact_consumer_event) joined on delivery_id. Pro: party-specific schemas explicit. Con: every delivery story query needs three joins. Most modern stacks pick the single table for query simplicity, with computed columns for the most-queried JSONB fields.
Modeling · L5

Model dasher shifts with overlapping breaks

fact_dasher_event captures every state change. To reconstruct shifts: window over events partitioned by dasher_id, identify Active-to-Inactive transitions exceeding 15 min as shift boundaries. Output table fact_dasher_shift one row per shift with start_ts, end_ts, total_minutes_active, total_minutes_idle, deliveries_completed, earnings_usd. Discuss why this is computed daily, not real-time (overhead vs freshness trade-off).
Behavioral · L5

Tell me about a time you had to push back on a senior stakeholder

DoorDash culture rewards data-driven debate. Story should cover: the specific data you presented, the counter-argument you considered, the eventual resolution. The stakeholder doesn't need to be wrong in the story; the strongest versions show you held a position, listened, and updated your view based on new information. Decision postmortem is the L5 signal.

What Makes DoorDash Data Engineer Interviews Different

Three-sided marketplace as the unit of analysis

Every system involves at least three actors (dasher, merchant, consumer) and often a fourth (DoorDash itself). Single-actor mental models don't fit. Always frame data as multi-party events with independent state machines.

Real-time + batch reconciliation is the standard

Live ops uses real-time aggregations (latency-critical). Finance uses batch (correctness-critical). Daily reconciliation jobs compare them. Mention this dual-track unprompted in any system design answer.

Logistics-flavored geospatial questions

DoorDash uses H3 hexagonal grid for dasher availability and zone-based pricing. Know H3 resolutions and when to use them. Common: 'design a query that returns the 5 closest available dashers to a restaurant'.

Stakeholder management is a graded skill

Data Engineer roles at DoorDash sit between product (wants speed), ops (wants reliability), and finance (wants correctness). The behavioral round explicitly tests how you balance these. Stories about pushing back on scope are the highest-leverage prep material.

How DoorDash Connects to the Rest of Your Prep

DoorDash overlaps heavily with the Instacart data engineer interview guide (three-sided grocery marketplace) and the Uber data engineer interview guide (food delivery via Uber Eats). The dispatch pipeline patterns also show up at the Lyft data engineer interview guide.

Drill the round-specific guides: the system design round prep guide for the dispatch pipeline, the data modeling round prep guide for three-party event modeling, the behavioral round prep guide for stakeholder collaboration stories.

Data Engineer Interview Prep FAQ

How long does DoorDash's Data Engineer interview take?+
4 to 5 weeks from recruiter screen to offer. Some teams move faster (2-3 weeks) when a specific headcount is urgent.
Is DoorDash remote-friendly?+
Hybrid. Most Data Engineer roles allow 2 to 3 days remote, with 2 days in San Francisco, NYC, or Seattle offices. Some teams (especially Logistics) prefer fully on-site.
What level should I target?+
IC3 (Senior) is the most common external hire. IC2 roles open occasionally. IC4+ are usually internal promotion.
Does DoorDash test algorithms?+
Light DSA in the Python round. Don't grind LeetCode; focus on data manipulation, state machines, and three-party event reconciliation patterns.
How important is logistics / dispatch knowledge?+
Critical for Logistics, Dispatch, and Marketplace teams. Less critical for Financial Data Platform or Consumer Growth. Ask the recruiter which team and tailor accordingly.
What languages can I use?+
Python and SQL universally. Kotlin or Go acceptable for backend-leaning Data Engineer roles. Scala for Spark-heavy roles.
Is the system design round whiteboard or virtual?+
Virtual via Excalidraw or Miro. Physical whiteboard only on the rare in-person finalist visit.
How is the offer negotiation?+
Initial at midpoint of range. RSU refreshers annual. Verified offers show successful negotiations of 10 to 25% over initial when candidates have competing offers.

Practice Three-Sided Marketplace System Design

Drill dispatch pipelines, delivery state machines, and three-party event modeling. Build the patterns that win the DoorDash design round.

Start Practicing

More Data Engineer Interview Prep Guides

Continue your prep

Data Engineer Interview Prep, explore the full guide

50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.

Interview Rounds

By Company

By Role

By Technology

Decisions

Question Formats