"What If the Data Arrives After the Pipeline Ran?"
Concepts covered: dmLateArriving
What They're Really Testing The interviewer is testing whether you design pipelines for the ideal case or the real case. The ideal case: data arrives in order, dimensions exist before facts, and no corrections are needed. The real case: mobile events arrive days late, upstream dimension changes are delayed, and corrections arrive after reports have been generated. The Two Categories The 60-Second Framework Why Companies Care Cite these in your answer: 'At Uber, 15% of ride events arrive more than 1 hour late due to driver app connectivity. A pipeline that closes hourly partitions loses 15% of rides. At Spotify, podcast listens from offline devices arrive up to 7 days late, causing royalty calculations to be systematically wrong.' Drop one of these in 10 seconds to show late data is not an
About This Interactive Section
This section is part of the Late-Arriving Data: Advanced lesson on DataDriven, a free data engineering interview prep platform. Each section includes explanations, worked examples, and hands-on code challenges that execute in real time. SQL queries run against a live PostgreSQL database. Python runs in a sandboxed Docker container. Data modeling problems validate against interactive schema canvases. All content is framed around what data engineering interviewers actually test at companies like Meta, Google, Amazon, Netflix, Stripe, and Databricks.
How DataDriven Lessons Work
DataDriven combines four interview rounds (SQL, Python, Data Modeling, Pipeline Architecture) with adaptive difficulty and spaced repetition. Easy problems get harder as you improve. Weak concepts resurface until you master them. Your readiness score tracks progress across every topic interviewers test. Every lesson section ends with problems you solve by writing and running real code, not by picking multiple-choice answers.