DataDriven
LearnPracticeInterviewDiscussDailyJobs

An IoT pipeline writes hourly batch output as GZIP-compressed CSV files in a partitioned curated zon

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
medium

Problem

An IoT pipeline writes hourly batch output as GZIP-compressed CSV files in a partitioned curated zone. Each 5GB file is not splittable, so Spark collapses to one task per partition and the dashboard takes 40 minutes. Apply the section's compression-and-splittability framing and replace the curated GZIP-CSV format with the splittable columnar format the section names as the de facto choice.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons