Loading section...
The Ingestion Layer
Concepts: paFileIngestion, paApiIngestion, paCdc
When the interviewer asks how data gets into your pipeline, they are probing for a real choice, not a list. The three source patterns (file drops, API pulls, and Change Data Capture) have different reliability, latency, and cost profiles. Name which one you chose and why before they have to ask. File-Based Ingestion The simplest and most common pattern: a source system drops files (CSV, JSON, Parquet) into cloud storage (S3, GCS, ADLS). Your pipeline picks them up on a schedule. This is the default for vendor data feeds, data exports from SaaS tools, and any system where you don't control the source. It's batch by nature - latency is measured in hours, not seconds. API-Based Ingestion When the source is a SaaS API (Salesforce, Stripe, HubSpot), you pull data via REST or GraphQL endpoints