# Every Scan, Every Parcel, Every Pin Code

> Out for delivery. Delivered. Except the events arrived backwards.

Canonical URL: <https://datadriven.io/problems/every_scan_every_parcel_every_pin_code>

Domain: Pipeline Design · Difficulty: medium · Seniority: L5

## Problem

We're a logistics company delivering parcels across 18,000 pin codes in India, and every scan at a warehouse, vehicle, or delivery point is a tracking event. Our customers expect real-time shipment status, and our ops team needs to know when parcels are stuck or moving in the wrong direction. The problem is that scan events arrive out of order - a parcel can be scanned at delivery before we've received the confirmation it left the origin hub. Design a pipeline that maintains accurate shipment state despite these ordering issues.

## Worked solution and explanation

### Why this problem exists in real interviews

Tracking events that can arrive out of order is a stateful-streaming problem in disguise: customers want the most-recent state in minutes, ops wants the absence of state (stuck parcels), and routing wants a fast lookup that doesn't slow scanning. The trap is processing each scan as it arrives without a per-parcel state machine, which is why a delivery scan can land before the origin scan and the customer sees the tracking page contradict itself.

The first reach is a stream that writes each scan to a tracking table keyed by parcel id, with an upsert on the latest scan. A delivery scan lands before the origin scan; the upsert by latest event-time looks correct, but if the delivery scan arrives first the tracking page briefly shows the parcel as delivered. Ops queries 'parcels not moving' against the same tracking table on a slow batch and finds out about stuck parcels hours after customers do. Routing lookups read the pin-code reference data from a database on every scan and the OLTP starts feeling it.

> **Trick to Solving**
>
> Per-parcel state on a streaming processor that respects event-time, stuck detection on a watermark-driven timer, routing reference in a low-latency lookup tier.
> 
> 1. Per-parcel state lives in the stream processor: the processor maintains an event-time-ordered view of the parcel's scans, so an out-of-order arrival is reordered before the customer-facing state is updated.
> 2. Stuck detection runs as a watermark-driven timer on the same per-parcel state: if no scan arrives for a parcel within the window, an alert fires.
> 3. Routing reference data (pin codes, depot mappings) lives in a low-latency lookup tier the stream reads on every scan, not in the OLTP.

---

### Walk the requirements

#### Step 1: Stream into a per-parcel state machine that respects event-time

Scan events flow through a queue and into a stream processor keyed by parcel id (AWB). For each parcel, the processor holds an event-time-ordered list of scans; when a scan arrives, the processor inserts it in event-time order and emits the resulting state to the customer-facing tracking store. An out-of-order delivery scan that arrives before the origin scan doesn't flash 'delivered' on the customer's page; the state stays at whatever the event-time-ordered sequence allows. Without a stateful streaming layer the customer view is whichever scan happened to land last, not the parcel's actual progress.

#### Step 2: Stuck-shipment detection as a per-parcel timer on the stream

The same per-parcel state holds a timer that fires when no scan has arrived inside the operational window. The timer is event-time / watermark-driven, so a delayed batch of scans doesn't falsely fire stuck alerts. When the timer fires, the processor emits a 'stuck' event to ops's alert path. Ops sees stuck parcels in minutes from the scan that should have arrived, not hours after a customer complains. A 'periodic batch query for parcels with no recent scan' design lags by the batch interval and floods on every load.

#### Step 3: Routing reference in a low-latency lookup tier, not in the OLTP

Routing lookups happen on every scan at peak rates. The stream reads pin-code routing reference from a low-latency online store (a key-value store fed from the routing source on a slow cadence), with a tens-of-milliseconds budget. The OLTP doesn't see the lookup traffic. Reading from the OLTP on every scan is the version that puts the operations system at risk during peak; the lookup tier is what keeps scanning fast and OLTP idle.

---

### The shape that fits

> **What this design gives up**
>
> Per-parcel state on the stream is more memory than 'upsert latest scan,' and the stream processor has to be sized for the active-parcels working set. A watermark-driven timer means stuck detection lags by the watermark interval. A separate routing-lookup tier doubles the storage of routing reference. Operational complexity is the cost; the win is a customer view that doesn't flash wrong states, ops alerts that fire from the absence of scans, and scanning that doesn't slow down on routing lookups.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - A queue or log buffers scan events between scanners and a stateful stream processor keyed by parcel id.
> - The stream processor maintains per-parcel state with event-time ordering and emits stuck-parcel alerts on a timer.
> - Routing reference data sits in a low-latency lookup tier read by the stream, not the OLTP.

> **The mistake that ships**
>
> What gets built first writes each scan to a tracking table keyed by parcel and reads it on the customer page. A delivery scan lands first because the origin hub's scanner had a network blip, and the customer sees the parcel marked delivered before it actually shipped. Ops's stuck-parcel detection runs as an hourly batch and the customer-complaint volume goes up before the dashboard does. Routing lookups against the OLTP slow scanning at peak; the operations team asks the data team to throttle. The rebuild centres on per-parcel state in the stream processor, watermark-driven stuck detection, and a routing lookup tier; each was a property the original cut decided to defer.

---

## Common follow-up questions

- A scanner is offline for several hours and dumps its buffer when it reconnects. What in this design reorders the events, and what does the customer see during the buffer flush? _(Tests whether the candidate sees the per-parcel state's event-time semantics: the buffered scans are inserted in event-time order, and the customer state lags briefly during the flush as the processor catches up. Without event-time ordering, the customer page flickers through the buffered states one at a time.)_
- Ops wants the stuck-parcel threshold to vary by route (rural routes tolerate longer than urban). Where does that configuration live, and what changes in the stream? _(Tests whether the candidate sees per-route thresholds as state the stream reads alongside the routing lookup, not hardcoded into the timer. The stream processor reads the threshold per parcel based on its current route and arms the timer accordingly.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/every_scan_every_parcel_every_pin_code)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.