DataDriven
LearnPracticeInterviewDiscussDaily
HelpContactPrivacyTermsSecurityiOS App

© 2026 DataDriven

Loading lesson...

  1. Home
  2. Learn
  3. Batch or Stream?

Batch or Stream?

The architecture decision that defines your pipeline

The architecture decision that defines your pipeline

Category
Pipeline Architecture
Difficulty
intermediate
Duration
35 minutes
Challenges
0 hands-on challenges

Topics covered: When Would You Use Streaming?, Design for Both Batch and Stream, What About Late Data?, Micro-batch or True Streaming?, How Do You Handle Failures?

Lesson Sections

  1. When Would You Use Streaming?

    The Decision Framework The interviewer wants to hear a number, not a vibe. Your first sentence should be: "What's the latency SLA?" Not "do we want real-time" - that's a wish, not a requirement. If they say "daily is fine," you just eliminated streaming from the conversation. That's a senior signal - you saved six months of unnecessary infrastructure. There are exactly four factors that drive the batch-vs-stream decision. Your answer should walk through all four. Miss any one of them and the int

  2. Design for Both Batch and Stream

    Lambda Architecture in Practice Here's the reality the interviewer is testing: production data platforms almost always need both batch and streaming. Batch gives you correctness. Streaming gives you speed. The question is how you merge them without creating a maintenance nightmare. Your answer should acknowledge both sides and then explain the merge strategy. The Lambda architecture formalized this pattern: a speed layer (streaming) serves approximate results immediately, while a batch layer rep

  3. What About Late Data?

    Watermarks, Windows, and Late Arrivals This is the #1 follow-up question after you propose a streaming architecture. Interviewers commonly ask some version of: "A user clicks at 11:59 PM but the event arrives at 12:03 AM. Which day does it belong to?" If you've already closed the daily window at midnight, you've got a problem. The interviewer wants to hear three words: watermark, allowed lateness, dead-letter. Start your answer here: a watermark is the system's estimate of "how far behind is the

  4. Micro-batch or True Streaming?

    Spark Structured Streaming vs Flink The interviewer is testing whether you pick a framework based on requirements or based on resume keywords. "Streaming" is a spectrum, not a binary. On one end, Spark Structured Streaming processes data in small batches (micro-batches) with a latency floor around 100 milliseconds. On the other end, Apache Flink processes each event individually. Your answer should match the framework to the SLA, not to your personal preference. The trigger interval in Spark SS

  5. How Do You Handle Failures?

    Exactly-Once, Offsets, and Checkpoints Every streaming system will fail. The interviewer knows this. The question isn't whether failures happen - it's whether your pipeline produces correct results when they do. Your answer framework: start with delivery guarantees, then explain offset management, then describe your idempotency strategy. Hit all three and you've covered the full rubric. The interviewer wants to hear that at-least-once is the production default. Not exactly-once. The trap is answ

Related

  • All Lessons
  • Practice Problems
  • Mock Interview Practice
  • Daily Challenges