DataDriven
LearnPracticeInterviewDiscussDaily
HelpContactPrivacyTermsSecurityiOS App

© 2026 DataDriven

Loading lesson...

  1. Home
  2. Learn
  3. Late-Arriving Data

Late-Arriving Data

Data always arrives late; your model either handles it gracefully or corrupts silently

Data always arrives late; your model either handles it gracefully or corrupts silently

Category
Data Modeling
Difficulty
advanced
Duration
25 minutes
Challenges
0 hands-on challenges

Topics covered: "What If the Data Arrives After the Pipeline Ran?", Late-Arriving Facts: Inserting Into Closed Partitions, Late-Arriving Dimensions: The Inferred Member, Correction and Reversal Patterns, Designing for Reprocessability

Lesson Sections

  1. "What If the Data Arrives After the Pipeline Ran?"

    What They're Really Testing The interviewer is testing whether you design pipelines for the ideal case or the real case. The ideal case: data arrives in order, dimensions exist before facts, and no corrections are needed. The real case: mobile events arrive days late, upstream dimension changes are delayed, and corrections arrive after reports have been generated. The Two Categories The 60-Second Framework Why Companies Care Cite these in your answer: 'At Uber, 15% of ride events arrive more tha

  2. Late-Arriving Facts: Inserting Into Closed Partitions

    After you identify the late-arriving fact scenario, the interviewer will ask: 'Show me how your pipeline handles it.' This is where candidates who have only read about late data stall, and candidates who have operated it in production accelerate. The answer involves partition management, aggregate recomputation, and a cascade strategy. Three Strategies: Know All Three, Recommend One The Cascade Problem Interviewers Probe Narrate the cascade: 'I insert the late fact into the March 15 partition. B

  3. Late-Arriving Dimensions: The Inferred Member

    The interviewer says: 'A fact row arrives but the dimension it references does not exist yet. What do you do?' This is the inferred member question, and it separates candidates who have built dimension loading pipelines from candidates who have only designed schemas. The wrong answers are 'skip the fact' and 'queue it for later.' The Pattern: Inferred Members Your inferred member answer: 'I insert a placeholder row into dim_customer with customer_id = C999, name = Unknown, city = Unknown, is_inf

  4. Correction and Reversal Patterns

    The interviewer asks: 'The source system sent incorrect data yesterday and just sent a correction. How does your model handle it?' They are testing whether you destructively update fact rows (no-hire signal) or use reversal patterns that preserve the audit trail (strong-hire signal). In financial data modeling interviews, this question is worth 20% of the scorecard. Why 'Just UPDATE It' Is a No-Hire Answer The Reversal Pattern: The Strong-Hire Answer Your reversal answer: 'Two new rows: a revers

  5. Designing for Reprocessability

    The closing question in any late-data interview: 'Can you reprocess this pipeline from scratch?' This tests whether your entire design is built for reprocessability or whether it has hidden assumptions that break on rerun. Candidates who say 'yes, because every component is idempotent and parameterized by date' pass. Candidates who hesitate fail. The Three Things the Interviewer Wants to Hear The Pattern You Should Be Able to Write Idempotent DELETE-INSERT per Partition The DELETE-INSERT pattern

Related

  • All Lessons
  • Practice Problems
  • Mock Interview Practice
  • Daily Challenges