# The Signals That Power Recommendations

> Fresh signals, many teams, one pipeline.

Canonical URL: <https://datadriven.io/problems/the_signals_that_power_recommendations>

Domain: Pipeline Design · Difficulty: medium · Seniority: L5

## Problem

Our personalization platform powers product recommendations for millions of users. It needs a continuous feed of user interaction signals - clicks, purchases, saves, and ratings - from multiple product surfaces and a feature computation layer that keeps user profiles current. Different teams own different signal sources and have different latency requirements. Design the ingestion and feature pipeline.

## Worked solution and explanation

### Why this problem exists in real interviews

Six teams emitting signals into one personalization pipeline that has to keep features fresh for the recommendation API and let GDPR deletion reach every derived store. The trap is letting consumers do format-specific parsing per producer and treating deletion as a follow-up project; both shortcuts compound until every consumer is fragile and every audit is a forensic exercise.

The default reach is to subscribe each consumer to each producer's topic with format-specific parsers, and recompute features on a nightly batch. The first time a producer ships a schema change, every consumer's parser needs an update; over six teams that becomes a coordination tax that nobody owns. Recommendations read 'recently viewed' from a nightly feature; the staleness shows up as worse recommendations within hours of a high-traffic event. A deletion request lands and the team finds the user's signals in five derived stores nobody documented.

> **Trick to Solving**
>
> One canonical event shape validated at publish, one event bus all consumers read, deletion propagated through every derived store with confirmation.
> 
> 1. All producers emit a canonical signal shape (event_type, user_id, item_id, surface, timestamp, payload). The bus's contract layer rejects publishes that don't match the canonical schema.
> 2. User interaction events flow through a streaming consumer that updates the feature row within minutes; nightly batch backfills the heavier features that don't need real-time.
> 3. Deletion is an event on the same bus the signals ride; each consumer applies the deletion to its store and writes a confirmation; an orchestrator collects them.

---

### Walk the requirements

#### Step 1: Recommendations use features fresh within minutes of the event

User interaction events ride the bus into a streaming consumer that updates the user's feature row in the online store within minutes. The recommendation API reads features at request time and never sees rows older than the streaming budget. The 'recently viewed' feature comes from this path. Without sub-minute updates the API's freshness lags by the offline cadence; recommendations get noticeably worse during the moments users are most engaged.

#### Step 2: One canonical event shape, validated at publish

Six teams emit signals; consumers can't carry six format-specific parsers. The bus's schema-contract layer rejects publishes that don't conform to the canonical shape (event_type, user_id, item_id, surface, timestamp, payload). Producers ship through the contract; an incompatible change is rejected at publish, not discovered downstream. Consumers read one shape. A 'we'll harmonize in a downstream transform' approach is the version where every consumer carries six parsers and every producer change ripples through them.

#### Step 3: Deletion propagates through every derived store

GDPR deletion has to reach the online store, the offline archive, and every derived feature. The deletion request enters as an event on the same bus the signals ride; each consumer applies the deletion (the online store removes the user's row, the archive tombstones the user's events, derived features recompute the affected partitions) and writes a confirmation. An orchestrator collects confirmations and reports the regulatory window. Without the propagation pattern, deletion is a forensic hunt across stores nobody documented.

---

### The shape that fits

> **What this design gives up**
>
> A schema-contract layer forces producer teams to coordinate on the canonical shape; the streaming path is more expensive than nightly batch for every feature; the deletion propagation pattern is a control plane that has to be operated. Implementation cost is the price; the win is consumers that read one schema, recommendations that don't go stale, and a deletion answer that comes from confirmations rather than promises.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - An event bus carries canonical-shape signals validated at publish; consumers read one shape rather than six.
> - A streaming path updates user features within minutes of events; the offline path computes heavier features on a slower cadence.
> - Deletion propagates through every consumer's store with per-store confirmations.

> **The mistake that ships**
>
> What gets shipped lets each producer publish in its own format with consumers carrying format-specific parsers, runs nightly batch features, and treats deletion as 'we'll add it later.' Every producer schema change ripples through six consumers; one team breaks every other team's pipeline once a quarter. 'Recently viewed' lags by the nightly cadence and recommendations get noticeably worse during peak. A GDPR request lands and the team realizes derived features in three teams' offline notebooks still include the user. The eventual rebuild adds the canonical contract, the streaming feature path, and the deletion fan-out; each was reachable up front if 'six teams emit different formats' had been treated as a contract problem instead of a parsing problem.

---

## Common follow-up questions

- A producer wants to add a new event type that no consumer reads yet. What does the schema-contract layer do, and what does it not enforce? _(Tests whether the candidate sees that the contract validates the canonical shape (the new event type is a value of event_type, not a new shape) and the bus carries it; consumers that don't subscribe to the type are unaffected. Adding fields to the canonical shape is an evolution the contract has rules for (additive, optional); a new event type doesn't break existing consumers.)_
- A deletion confirmation is overdue from a derived-feature consumer. What does the orchestrator surface, and what does the user-facing audit response look like? _(Tests whether the candidate sees the orchestrator hold the request open with retries and an alert; the audit response references the request's open status with the per-store confirmations as they land, rather than a single global pass/fail. The regulator gets a record of what's confirmed and what's outstanding.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_signals_that_power_recommendations)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.