The Four Shapes of Ingestion

Concepts covered: paBatchVsStreaming

Every byte that enters a pipeline arrives through one of four shapes. The shape is determined by who initiates the transfer and what kind of artifact is being transferred. Naming the four shapes is the first move because every later concern, from scheduling to error handling to schema validation, is shaped by the choice. A pipeline that ingests from a Postgres database is structurally different from a pipeline that consumes from a Kafka topic, and pretending the difference does not matter is the most common reason ingestion code rots. The Four Shapes Side by Side Two dimensions sort the shapes. The first dimension is who decides when the data moves: the pipeline pulling on a schedule, or the source pushing as data is produced. The second dimension is whether the data is delivered as discre

About This Interactive Section

This section is part of the Ingestion Patterns: Beginner lesson on DataDriven, a free data engineering interview prep platform. Each section includes explanations, worked examples, and hands-on code challenges that execute in real time. SQL queries run against a live PostgreSQL database. Python runs in a sandboxed Docker container. Data modeling problems validate against interactive schema canvases. All content is framed around what data engineering interviewers actually test at companies like Meta, Google, Amazon, Netflix, Stripe, and Databricks.

How DataDriven Lessons Work

DataDriven combines four interview rounds (SQL, Python, Data Modeling, Pipeline Architecture) with adaptive difficulty and spaced repetition. Easy problems get harder as you improve. Weak concepts resurface until you master them. Your readiness score tracks progress across every topic interviewers test. Every lesson section ends with problems you solve by writing and running real code, not by picking multiple-choice answers.