Loading section...
Watermarks: The Engine's Promise
Concepts covered: paWatermarks
If a streaming engine waits forever for late events, no window ever closes and no result is ever produced. If it does not wait at all, every late event is dropped and every aggregation is wrong. The watermark is the compromise. A watermark is a timestamp that the engine emits, periodically, declaring that no events with event_time earlier than the watermark will be processed against an open window. The watermark is the engine's commitment to a closing rule. What a Watermark Actually Is A watermark is not a wall clock. It advances based on observed events. If the input stream stalls, the watermark stalls. If the input stream is heavily out of order, the watermark lags behind the wall clock by however far behind the slowest event runs. The engine emits the watermark with each batch of record
About This Interactive Section
This section is part of the Schema Evolution and Late Data: Intermediate lesson on DataDriven, a free data engineering interview prep platform. Each section includes explanations, worked examples, and hands-on code challenges that execute in real time. SQL queries run against a live PostgreSQL database. Python runs in a sandboxed Docker container. Data modeling problems validate against interactive schema canvases. All content is framed around what data engineering interviewers actually test at companies like Meta, Google, Amazon, Netflix, Stripe, and Databricks.
How DataDriven Lessons Work
DataDriven combines four interview rounds (SQL, Python, Data Modeling, Pipeline Architecture) with adaptive difficulty and spaced repetition. Easy problems get harder as you improve. Weak concepts resurface until you master them. Your readiness score tracks progress across every topic interviewers test. Every lesson section ends with problems you solve by writing and running real code, not by picking multiple-choice answers.