DataDriven
LearnPracticeInterviewDiscussDailyJobs

A streaming job writes a new 4 KB Parquet file every few seconds

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
medium

Problem

A streaming job writes a new 4 KB Parquet file every few seconds. After a month the table has millions of tiny files and every query spends its time opening files, not reading data. Fix the file layout without dropping the streaming ingest.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons