# Badging Items That Already Sold Out

> Same-day delivery. The features have to be faster.

Canonical URL: <https://datadriven.io/problems/badging_items_that_already_sold_out>

Domain: Pipeline Design · Difficulty: hard · Seniority: L6

## Problem

We guarantee same-day delivery on millions of SKUs, and the engine that decides which items to badge as Rocket Delivery needs fresh data every few minutes. Right now, stale features are causing us to over-promise on items that have gone out of stock or whose nearest fulfillment center is already at capacity. Design a data pipeline that keeps the feature store current.

## Worked solution and explanation

### Why this problem exists in real interviews

Same-day badging on every product page with four properties pulling at the design: minutes-fresh features, millisecond lookups, zero load on the OLTP, and expiry semantics for discontinued SKUs. The trap is letting the badge engine query the OLTP synchronously, or letting the feature store hold rows indefinitely past a SKU's lifecycle.

The default reach is for the badge engine to query inventory and orders directly on each product view, with a nightly batch refreshing a backup feature table. The OLTP buckles under page-view load; the operations team backs the data team off. Inventory data is stale from the nightly batch and same-day promises fire on items that just went out of stock. Discontinued SKUs keep getting badged because nothing prunes their feature row.

> **Trick to Solving**
>
> CDC inventory and orders into a stream, materialize into an online store for millisecond reads, expire entries when SKUs are discontinued.
> 
> 1. CDC off the inventory and order databases adds zero read pressure; the streams feed a streaming consumer that updates an online store within minutes of a change.
> 2. The badge engine reads from the online store with a millisecond budget; the OLTP sees no badge-engine traffic.
> 3. SKU lifecycle events (discontinued, paused) propagate through the same path; the online row sets a TTL or is removed so the badge stops being served.
> 4. Offline training reads features from the lake / warehouse with full history; the online and offline computations come from the same definition.

---

### Walk the requirements

#### Step 1: Refresh the badge features within minutes of an inventory or order change

Inventory and order databases emit CDC into a stream; a streaming consumer maintains the per-SKU badge feature in an online store within minutes. The badge engine reads from the online store. Without sub-minute updates the named problem (false same-day promises) is unaddressed; without an online store the badge engine has nothing to read at request latency.

#### Step 2: Millisecond lookups from an online store, separate from the historical store

Every product view triggers a badge check. The online store is keyed on SKU and sized for millisecond point-lookups. The historical / training store sits behind a slower path and isn't touched by the request. A 'compute the badge at request time' design is the version where the page hangs at peak; pre-computed features in an online store sized for the request budget is what makes the badge feel free.

#### Step 3: Zero load on the OLTP , features come from CDC, not synchronous reads

Inventory and order databases are live operational systems. CDC reads from the change log without query load; the badge engine never queries them. The streaming consumer is the only thing reading the change feed, and it does so from a managed connector. A 'just hit inventory on every page view' design is the version operations rejects on day one because product-page traffic puts the OLTP at risk.

#### Step 4: Discontinued SKUs expire from the online store

When a SKU is discontinued, the SKU lifecycle event flows through the same path and either removes the SKU's online-store row or sets a tombstone with a TTL. The badge engine reads the store and finds nothing for a discontinued SKU. Without expiry semantics the online store accumulates stale rows; with them, the badge engine is correct by construction. Periodic compaction garbage-collects rows past the TTL.

---

### The shape that fits

> **What this design gives up**
>
> CDC on inventory and orders adds connector infrastructure and a replication slot the database team will worry about; the online store is more expensive than a periodic batch refresh because it's sized for millisecond reads at peak; the offline training path runs alongside; lifecycle propagation needs SKU-event ingestion. Implementation cost is the price; the win is no false same-day promises, no OLTP query pressure, and no stale badges on discontinued SKUs.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - CDC reads inventory and order changes off the live databases without query load.
> - A streaming path materializes badge features into an online store with a millisecond read budget.
> - Discontinued SKUs remove or expire their feature row so the badge engine stops serving them.
> - Offline features feed training on a slower batch path from the same source.

> **The mistake that ships**
>
> What gets shipped lets the badge engine query inventory and orders synchronously on every product view, with a nightly batch refresh as a backup. The OLTP buckles under page-view load and the operations team backs the data team off. The badge fires on stale data and the team starts paying out vouchers for false promises. Discontinued SKUs keep getting badged because nothing prunes the row. The eventual rebuild adds CDC, the online store, the lifecycle propagation, and the offline path , each was reachable in the original conversation if 'on every product view' had been treated as a per-view-budget design.

---

## Common follow-up questions

- An inventory database update lands but the streaming consumer is paused for maintenance. What does the badge engine see, and how does the design recover? _(Tests whether the candidate sees the CDC log retaining changes during the pause; the streaming consumer replays from the last LSN when it resumes and the online store catches up. During the pause the badge engine reads the last-known features (with a freshness flag if the design surfaces it); recovery doesn't lose updates because the change log is the authoritative source.)_
- A SKU is reactivated after being discontinued. What in this design lets the badge fire again, and what's the latency? _(Tests whether the candidate sees the lifecycle event flowing through the same path, removing the tombstone or re-creating the row, and the badge engine reading the new row within minutes. The reactivation latency matches the streaming path's normal budget; no manual cleanup is needed.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/badging_items_that_already_sold_out)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.