Question 1

What does a pipeline architecture practice problem test that a diagram alone does not?

Accepted Answer

The decision behind the diagram. Most candidates can draw boxes and arrows; the rubric scores whether the boxes match the constraint. Four decisions are drilled on every problem: batch versus streaming at a quantitative freshness threshold, ETL versus ELT transformation placement, delivery semantics, and the failure modes the prompt makes load-bearing. A correct-looking diagram with the wrong batch-versus-streaming call still fails.

Question 2

How is the batch-versus-streaming decision scored?

Accepted Answer

Quantitatively, against the freshness SLA in the prompt. Over five minutes: batch or micro-batch is almost always the cheaper, simpler answer. Under one minute: streaming is forced. Between one and five minutes is judgment territory. The single most common failed-round pattern is defaulting to streaming (Kafka plus Flink) when a fifteen-minute dashboard SLA is a micro-batch problem that a dbt incremental solves at a fraction of the cost.

Question 3

When does ETL beat ELT in these problems?

Accepted Answer

ELT is the 2026 default because warehouse compute is cheap and a raw bronze layer enables replay. ETL still wins in three cases: raw data is sensitive and cannot land in the warehouse unmasked, the warehouse is undersized for the transform, or downstream consumers need the data already shaped. The rubric rewards naming the condition that flips the answer, not the answer itself.

Question 4

What delivery semantics should I default to?

Accepted Answer

At-least-once with idempotent writes: run_id baked into output partitions plus MERGE INTO on a composite natural key. The business cares about exactly-once effect, not exactly-once message delivery, and idempotency gets you there cheaply. Reach for true exactly-once (Kafka transactions, Flink two-phase-commit sinks) only when downstream genuinely cannot deduplicate. At-most-once fits fire-and-forget telemetry and little else.

Question 5

How do I show senior signal on failure modes?

Accepted Answer

Name them before the interviewer asks. For each component, state what happens when it dies, when it backs up, and when the upstream schema changes. Kafka: broker death handled by replication factor 3, partition skew, consumer lag. Spark Structured Streaming: executor OOM, over-aggressive watermark dropping late data, checkpoint corruption. Snowflake MERGE: deadlock, uncommitted partition, schema drift. Pick the one or two relevant to the prompt and design for them.

Question 6

Which scenarios are in the pipeline architecture bank?

Accepted Answer

Clickstream into a warehouse (batch-versus-streaming threshold), CDC from a production database (Debezium versus read replica, schema evolution), near-real-time fraud detection (genuine streaming, exactly-once), sessionization at scale (stateful streaming, late events), multi-region failover (active-active versus active-passive, RPO and RTO), daily revenue close (idempotency and reconciliation), embedded analytics (workload isolation, caching), and legacy ETL to dbt migration (dependency graph, diff-test cutover).

Question 7

Should I bring up cost in an architecture round?

Accepted Answer

Yes, briefly, and after the design is sketched. Bringing up cost too early reads as cost-anxiety; never mentioning it reads as inexperience. The senior move is one sentence after the sketch: 'I'd estimate this at low hundreds a month at the stated volume; if cost is constrained, the next move is X.' Cost is a first-class rubric dimension because an over-provisioned design that meets the SLA still loses to a tight one that also meets it.

Question 8

How many pipeline architecture problems should I practice before an onsite?

Accepted Answer

Eight to twelve well-practiced scenarios across the recurring shapes beats twenty rushed ones. The signal interviewers test is recognizing the prompt shape inside the first minute and reaching for the constraint-matched design, not the famous tool. Volume matters less than transferring the four decisions to a new source-transform-serve combination.

Pipeline Architecture Practice Problems

Pipeline Architecture Practice Problems