DataDriven
LearnPracticeInterviewDiscussDailyJobs

A streaming pipeline writes a Parquet file every 30 seconds per partition into an Iceberg table

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
medium

Problem

A streaming pipeline writes a Parquet file every 30 seconds per partition into an Iceberg table. After four months the table holds 11 million sub-megabyte files and query latency has drifted from 12 seconds to 8 minutes. Apply the section's small-files framing and add the named scheduled job that runs alongside the streaming writer to keep file size in the section's target range.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons