# The Register Never Sleeps

> Every swipe lands in the warehouse. The table has to stay current without breaking.

Canonical URL: <https://datadriven.io/problems/real_time_pos_pipeline_with_snowpipe_and_merge>

Domain: Pipeline Design · Difficulty: medium · Seniority: L5

## Problem

We're a retail chain with 600 stores and a Snowflake data warehouse. Our POS terminals generate sales transactions in real time but right now they batch-upload to S3 at end of day, which means the operations team can't see intraday sales. Design a pipeline that gets POS data into Snowflake with low latency, while correctly merging with the historical data already in the warehouse.

## Worked solution and explanation

### Why this problem exists in real interviews

Three properties pulling at one POS pipeline: minutes-fresh visibility for ops, voids and corrections that have to apply without duplicating, and store-outage detection that fires within hours rather than at end-of-day. The trap is treating the void as 'just another row' or treating the outage as something somebody will notice.

The default reach is to swap the end-of-day batch for a continuous ingestion that appends every event. Ops sees intraday sales. A void arrives referencing the original transaction; the appender writes it as a new row and the warehouse now has both 'completed' and 'voided' for the same transaction id, which downstream queries treat as two records. A store stops sending events at noon and nobody knows until end-of-day numbers come in low.

> **Trick to Solving**
>
> Continuous ingestion that merges by transaction id, voids resolved against the original, per-store heartbeat alerts during business hours.
> 
> 1. Continuous ingestion (streaming or micro-batch) lands transactions in the warehouse within minutes; the load uses MERGE keyed on transaction id.
> 2. Voids and corrections reference the original transaction id and the MERGE applies them in place, updating status and net amount; the original isn't duplicated, and consumers see one row per transaction with the current state.
> 3. Per-store heartbeat alerts fire when no transactions arrive from a store for the expected business-hours window; ops sees the silence within hours.

---

### Walk the requirements

#### Step 1: Land transactions in the warehouse within minutes

POS terminals publish to a queue and a continuous loader writes to the warehouse within minutes. The ops dashboard reads the warehouse and sees intraday sales as they happen. Without a continuous path the named problem (end-of-day blindness) is unaddressed; without a warehouse tier the ops dashboard has nowhere to read.

#### Step 2: Voids resolve against the original by id, not as duplicates

When a void or correction arrives referencing an original transaction, the loader applies a MERGE keyed on transaction id: the original row's status flips to 'voided' (with the void timestamp and the void's id), the net amount updates if it's a correction, and no new row is added for that id. Consumers always see one row per transaction with its current state. Appending the void as a new row is the version where the warehouse has duplicates; the MERGE is the contract.

#### Step 3: Per-store heartbeat alerts during business hours

Each store sends transactions during its business hours. A monitoring step tracks per-store transaction rates; when a store goes silent during its operating hours past a tolerance, an alert fires to ops with the store id and the time of last transaction. Ops sees the silence within hours. Discovering the gap at end-of-day is the version where the team finds out from low numbers; per-store heartbeat is what surfaces the silence while there's still time to act.

---

### The shape that fits

> **What this design gives up**
>
> Continuous ingestion costs more compute than the end-of-day batch; the MERGE-by-id needs an index on transaction id; per-store heartbeat monitoring requires per-store baselines and alerting. Implementation cost is the price; the win is ops that sees the day as it happens, voids that resolve in place rather than appearing as duplicates, and store outages that surface in hours.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - A continuous ingestion path lands POS transactions in the warehouse within minutes.
> - Voids and corrections apply against the original transaction by id; the warehouse holds one row per transaction with the current status.
> - Per-store heartbeat detection alerts ops when a store stops sending during business hours.

> **The mistake that ships**
>
> What gets shipped swaps end-of-day batch for continuous append. Ops sees intraday sales but voids accumulate as duplicate rows; downstream queries see two rows per voided transaction id and totals diverge from the cashier tape. A store goes silent at noon and nobody finds out until end-of-day numbers come in low. The rebuild adds MERGE-by-id and heartbeat monitoring , both reachable in the original conversation if 'continuous ingestion' had been treated as 'continuous + correct + observed' rather than just 'faster.'

---

## Common follow-up questions

- A void arrives for a transaction that hasn't been ingested yet (out of order). What does this design do, and how does the warehouse converge? _(Tests whether the candidate sees the MERGE handling out-of-order events: the void can either insert a placeholder void row that the original's later arrival reconciles, or buffer until the original arrives. Either way, the warehouse converges to one row per transaction id with the correct final state.)_
- A store's heartbeat fires false positives during slow hours. How does the design avoid alerting fatigue without missing real outages? _(Tests whether the candidate's heartbeat monitor uses per-store baselines (not a global threshold), so a store with naturally slow afternoon hours has a wider expected-silence window than a high-volume store. Alerting fatigue is what makes ops stop reading alerts; per-store calibration is what keeps each one actionable.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/real_time_pos_pipeline_with_snowpipe_and_merge)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.