# Credit for Every Touch

> They saw the ad, clicked the email, then bought. Who gets credit?

Canonical URL: <https://datadriven.io/problems/credit_for_every_touch>

Domain: Pipeline Design · Difficulty: medium · Seniority: L6

## Problem

Our marketing analytics platform stitches together ad spend from Google and Facebook, CRM opportunities from Salesforce, and web session data from multiple clients. The attribution model needs to give credit to every touchpoint in the customer journey before a conversion, but the data quality is inconsistent across sources and some of them don't expose an updated_at field, which makes incremental processing painful. Design a transformation pipeline that handles these sources reliably and keeps attribution fresh enough for daily campaign decisions.

## Worked solution and explanation

### Why this problem exists in real interviews

Multi-tenant attribution with four properties pulling apart: a morning deadline, faster-than-batch propagation for closed-won deals, model swaps that have to be config-only, and per-client isolation. The trap is one shared transformation that pulls every client and every source through the same code path; one client's malformed feed delays everyone, and a model swap means a code release.

The default reach is one nightly job that pulls every source for every client, joins to a shared attribution function, and writes per-client tables. The first time a Salesforce export is malformed for one client, the job fails and nobody's attribution lands by morning. Closed-won deals propagate at next-batch speed, hours after the marketing team needs them. Switching a client to a different attribution model requires editing the transformation code and redeploying.

> **Trick to Solving**
>
> Per-client transformation with the attribution model as configuration, a fast path for closed-won, and an orchestrator that owns the morning deadline.
> 
> 1. Per-client transformation tasks under one orchestrator; one client's malformed source halts only that client's pipeline and is alerted by name.
> 2. The attribution model is a per-client configuration the transformation reads at run time; switching a client's model is a config change.
> 3. Closed-won opportunities ride a faster path (CDC from CRM into a streaming consumer that updates per-client attribution) so revenue credit lands in hours, not at next batch.
> 4. An orchestrator schedules per-client runs against the morning deadline with per-client alerting.

---

### Walk the requirements

#### Step 1: Per-client morning delivery, with per-client alerting before the deadline

The orchestrator runs each client's pipeline as its own DAG (or per-client task within a DAG), sized to land before the morning deadline. Sensors fire per client if any client's run is at risk. On-call sees a late client by name, not a generic 'pipeline late.' Without orchestration there's nothing watching the deadline; without a warehouse tier the per-client attribution has nowhere to land.

#### Step 2: Closed-won credit on a faster path

When a deal closes won in the CRM, the revenue credit has to land in attribution within hours. Closed-won events flow through a CDC path from the CRM into a streaming consumer that updates the affected client's attribution; the nightly batch still rebuilds the rest. Treating closed-won as just another row in the next batch is the version where the marketing team is making bid decisions on yesterday's revenue. The path is narrow (closed-won only) so the streaming compute stays sized for what's actually time-sensitive.

#### Step 3: Attribution model as per-client configuration

Clients switch between attribution models (last-touch, first-touch, position-based, time-decay). The transformation reads the active model from a per-client config at run time and applies it to that client's touchpoints. A switch is a config update; no code release, no migration. Hard-coding the model into the transformation is the version where every change is a deployment, and rollout-stage clients run on different code versions while the team coordinates.

#### Step 4: Per-client isolation so one malformed source doesn't break another

Every client runs as its own task graph with its own state and its own failure boundary. A malformed Salesforce export for client A halts client A's run and alerts on it; client B's run keeps going. Sharing a transformation step across clients means one client's bad row breaks everyone's morning. The orchestrator's per-client task is the unit of isolation; the warehouse output table partitions by client so a downstream re-run for one client doesn't disturb the others.

---

### The shape that fits

> **What this design gives up**
>
> Per-client tasks make the DAG wider than one shared job; CDC for closed-won is a separate ingestion path the team has to operate; configurable attribution models add a config layer the transformation reads on every run. Implementation cost is the price; the win is per-client mornings that don't fail because of another client's bad data, closed-won credit that's hours-fresh, and model switches that don't need a deploy.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - An orchestration layer schedules per-client transformation tasks with sensors and alerts before each client's morning deadline.
> - The attribution model is selected per client from configuration; no code change is required to switch.
> - Closed-won opportunities flow through a faster path that updates attribution within hours, separate from the nightly batch.
> - Per-client isolation: a malformed source for one client does not block or affect any other client's outputs.

> **The mistake that ships**
>
> What gets shipped runs one nightly job over every client and every source, with the attribution model in code. The first time a client's Salesforce export is malformed, the entire job fails and every client's morning is missing. Closed-won deals propagate at next-batch speed; the marketing team makes bid decisions hours behind the revenue. A client switching attribution models requires a code release that goes through a quarter's worth of QA. The eventual rebuild is per-client tasks, a CDC fast path for closed-won, and config-driven model selection.

---

## Common follow-up questions

- A new client joins with three months of historical data and a custom attribution model nobody else uses. What in this design lets them onboard, and what doesn't? _(Tests whether the candidate sees the per-client config plus per-client isolation as the onboarding boundary: a new config entry, a backfill task on the per-client DAG, and the new model added to the configurable model library if needed. The shared transformation, the warehouse layout, and the other clients' runs don't change.)_
- A source's API rate-limits the per-client loader during a peak. What does this design do, and what does the morning report look like? _(Tests whether the candidate has thought about per-source backoff and partial-completion: the loader retries with backoff, the orchestrator alerts if the rate-limit window crosses the deadline, the morning report renders with the affected source flagged and the rest of the client's data complete. Per-client isolation keeps the rest of the platform unaffected.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/credit_for_every_touch)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.