A streaming job writes a new 4 KB Parquet file every few seconds
A medium Pipeline Design mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.
- Domain
- Pipeline Design
- Difficulty
- medium
Interview Prompt
A streaming job writes a new 4 KB Parquet file every few seconds. After a month the table has millions of tiny files and every query spends its time opening files, not reading data. Fix the file layout without dropping the streaming ingest.
How This Interview Works
- Read the vague prompt (just like a real interview)
- Ask clarifying questions to the AI interviewer
- Write your pipeline design solution with real code execution
- Get instant feedback and a hire/no-hire decision