Questions reported by candidates in 2024-2026 loops, paraphrased and de-identified. Each answer covers the approach, the gotcha, and the typical follow-up.
SQL · L4
Compute dasher acceptance rate per market per day
Acceptance rate = orders_accepted / orders_offered, grouped by market and day. The gotcha is attribution: dashers active across multiple markets in the same day need a rule (assign to first market of the day, or split proportionally by time spent). Volunteer the assumption. Follow-up: how do you handle markets with very low order volume that produce noisy rates? Answer: filter to markets with at least N offers per day, or report a Wilson score interval instead of point estimate.
SELECT
market_id,
date_trunc('day', offered_ts) AS day,
COUNT(*) FILTER (WHERE accepted) * 1.0 / COUNT(*) AS acceptance_rate,
COUNT(*) AS offer_count
FROM dasher_offers
GROUP BY market_id, day
HAVING COUNT(*) >= 50 -- noise filter
ORDER BY day, market_id;SQL · L4
Find merchants with order volume drop greater than 20% week-over-week
Aggregate weekly orders per merchant. LAG to get prior week. Compute (current - prior) / NULLIF(prior, 0). NULLIF prevents division by zero on new merchants. Filter where abs(change_pct) > 0.2. The interviewer's follow-up: what about seasonality? Answer: compare to the same week prior year (LAG with offset 52), or use a moving baseline (4-week trailing average). The naive WoW comparison flags every Thanksgiving and Christmas as a drop.
SQL · L4
Top 5 cities by 7-day rolling delivery time degradation
Compute rolling 7-day average delivery_time per city. LAG to compare to the previous 7-day window. Order by absolute degradation desc, limit 5. Volunteer the confounders: weather events (rain spikes delivery time), local events (concerts, sports games), driver supply shocks. Senior candidates suggest an explainability layer that attributes the degradation to known causes.
SQL · L5
Compute supply / demand ratio per H3 cell per 5-min window
Two source tables: dasher_availability (per dasher per minute, with H3 cell) and order_creation (per order, with H3 pickup cell). Aggregate to (h3_cell, 5_min_bucket). Compute supply (count distinct dashers) and demand (count orders). Ratio = supply / demand. Edge case: cells with zero supply produce infinity; clamp or report separately. This query is the input to the surge engine; mention that context.
SQL · L5
Detect canceled orders that should have been refunded but weren't
LEFT JOIN order_cancellations to refund_events on order_id. WHERE refund_event IS NULL AND cancellation_ts < NOW() - INTERVAL '1 hour'. Discuss why the time buffer (refunds are async, may take minutes), and why this query is the right one to alert on (financial impact, customer experience). Follow-up: how do you operationalize this? Answer: scheduled query every 15 min, output to alert topic, downstream service files refund automatically or pages support.
Python · L4
Sessionize dasher shifts with 15-min idle gap
Sort events by (dasher_id, ts). Walk events. Increment shift_id when gap > 15 min OR dasher state transitions from Active to Inactive. State assumption: events with same ts are consecutive (use stable sort). Edge case: a shift that crosses midnight should be one shift, not split on date boundary. Volunteer this. Follow-up: how do you handle a clock-skewed event with ts in the future? Answer: drop or quarantine to dead-letter for review.
Python · L4
Implement state machine for delivery lifecycle
States: Created -> Assigned -> PickedUp -> DroppedOff -> Completed, with parallel paths for Cancelled and Refunded. Validate transitions: from Created, only Assigned or Cancelled is valid. From Assigned, only PickedUp or Cancelled. Reject invalid (e.g., Created directly to DroppedOff). Track timestamp per state. Discuss the late-arriving Cancellation event problem: if Cancellation arrives after PickedUp, classify as “cancelled-in-progress” and route to manual review.
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import Optional
class DeliveryState(Enum):
CREATED = "created"
ASSIGNED = "assigned"
PICKED_UP = "picked_up"
DROPPED_OFF = "dropped_off"
COMPLETED = "completed"
CANCELLED = "cancelled"
VALID_TRANSITIONS = {
DeliveryState.CREATED: {DeliveryState.ASSIGNED, DeliveryState.CANCELLED},
DeliveryState.ASSIGNED: {DeliveryState.PICKED_UP, DeliveryState.CANCELLED},
DeliveryState.PICKED_UP: {DeliveryState.DROPPED_OFF, DeliveryState.CANCELLED},
DeliveryState.DROPPED_OFF: {DeliveryState.COMPLETED},
DeliveryState.COMPLETED: set(),
DeliveryState.CANCELLED: set(),
}
@dataclass
class Delivery:
delivery_id: str
state: DeliveryState = DeliveryState.CREATED
state_history: list[tuple[DeliveryState, datetime]] = field(default_factory=list)
def transition(self, new_state: DeliveryState, ts: datetime) -> bool:
if new_state not in VALID_TRANSITIONS[self.state]:
return False # caller routes to dead-letter
self.state_history.append((self.state, ts))
self.state = new_state
return TruePython · L5
Match dashers to orders using H3 ring search
Naive O(n*m) pair-distance is wrong at scale. Better: bucket dashers by H3 cell. For each order, search the pickup cell + ring(1) + ring(2) for available dashers. Compute Haversine for the candidate set (typically <50 dashers vs total fleet of 100K). Discuss the ring radius trade-off: too small misses dashers just outside the cell, too large is wasteful.
Python · L5
Detect duplicate orders within a 60-second window per consumer
Common scenario: app retry creates two orders. Walk events per consumer, compare each new order to recent orders for same merchant + similar cart hash. Within 60 seconds and >90% cart similarity = duplicate. Edge case: legitimate re-order minutes apart (gym member + family lunch); avoid false positives. Discuss the threshold tuning trade-off.
System Design · L5
Design the order-to-dasher dispatch pipeline
Order created event -> Kafka -> dispatch service reads, queries available dashers within H3 ring (Cassandra by cell, kept warm by per-cell index), runs matching algorithm, emits assignment event back to Kafka. Dasher app consumes assignment topic. Cover: exactly-once assignment via deduplication of assignment_id, dasher-side accept/reject (timeout returns order to dispatch pool), batching multiple orders to one dasher when efficient, surge pricing trigger when supply low (writes to surge topic). Failure modes: dispatch service crash (Kafka redelivers, dedup ensures idempotency), dasher offline mid-pickup (timeout returns order, customer notified).
System Design · L5
Design the delivery event sourcing pipeline
Every state transition is an immutable event written to a partitioned Kafka topic (key = delivery_id). Materialized view of current delivery state in Cassandra for live operations. Replay capability for late events: a Cancellation event that arrives 30 minutes late triggers a state recomputation for that delivery, downstream effects (refund, dasher unblock) are emitted. Audit log immutable on S3, partitioned by date, for finance reconciliation. Discuss event ordering: producer-side ordering by ts, consumer-side reordering by delivery_id key partition.
System Design · L5
Design the merchant menu sync pipeline
Merchant POS systems push menu updates via webhook to ingest service -> Kafka menu_updates topic -> Flink validation + transformation (normalize prices, modifiers, categories) -> Snowflake (analytics) + Cassandra (live serving for consumer search). Cover: schema variability per POS vendor (Toast, Square, Clover), partial-update handling (price changes vs full menu refresh), search-index refresh (debounced, batched per merchant per hour). Discuss eventual consistency: a price change takes <5 min to appear to consumers, which is acceptable.
Modeling · L5
Design schema for three-party event tracking (dasher + merchant + consumer)
Two valid approaches. Single fact table fact_delivery_event with party_role column (Dasher / Merchant / Consumer), event_type, ts, lat/lon, payload (JSONB). Pro: single table to query for full delivery story. Con: party-specific attributes hidden in JSONB. Alternative: three fact tables (fact_dasher_event, fact_merchant_event, fact_consumer_event) joined on delivery_id. Pro: party-specific schemas explicit. Con: every delivery story query needs three joins. Most modern stacks pick the single table for query simplicity, with computed columns for the most-queried JSONB fields.
Modeling · L5
Model dasher shifts with overlapping breaks
fact_dasher_event captures every state change. To reconstruct shifts: window over events partitioned by dasher_id, identify Active-to-Inactive transitions exceeding 15 min as shift boundaries. Output table fact_dasher_shift one row per shift with start_ts, end_ts, total_minutes_active, total_minutes_idle, deliveries_completed, earnings_usd. Discuss why this is computed daily, not real-time (overhead vs freshness trade-off).
Behavioral · L5
Tell me about a time you had to push back on a senior stakeholder
DoorDash culture rewards data-driven debate. Story should cover: the specific data you presented, the counter-argument you considered, the eventual resolution. The stakeholder doesn't need to be wrong in the story; the strongest versions show you held a position, listened, and updated your view based on new information. Decision postmortem is the L5 signal.